All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Apparmor move_mount mediation breaks mount tool in containers
       [not found] ` <582eb2e9-ce80-4f96-a4bc-bef1a508e0ab@canonical.com>
@ 2023-12-04  1:34   ` Stéphane Graber
  2023-12-04 13:14     ` John Johansen
  2023-12-05 12:24     ` Linux regression tracking #adding (Thorsten Leemhuis)
  0 siblings, 2 replies; 17+ messages in thread
From: Stéphane Graber @ 2023-12-04  1:34 UTC (permalink / raw)
  To: John Johansen
  Cc: Christian Brauner, Sasha Levin, Aleksa Sarai,
	Alexander Mihalicyn, regressions

On Sun, Dec 3, 2023 at 8:21 PM John Johansen
<john.johansen@canonical.com> wrote:
>
> On 12/2/23 17:20, Stéphane Graber wrote:
> > Hey John,
> >
> > Upstream commit 157a3537d6bc28ceb9a11fc8cb67f2152d860146 which just
> > landed in 6.6.3 stable as 96af45154a0be30485ad07f70f852b1456cb13d7 is
> > blocking new mounts for all LXC, LXD and Incus users (at least) on
> > distributions using the newer version of util-linux.
> >
> > That's because for a simple mount like "mount -t tmpfs tmpfs /tmp",
> > the new mount command now performs:
> > ```
> > fsconfig(3, FSCONFIG_SET_STRING, "source", "tmpfs", 0) = 0
> > fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = 0
> > fsmount(3, FSMOUNT_CLOEXEC, 0)          = 4
> > statx(4, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_MNT_ID,
> > {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID,
> > stx_attributes=STATX_ATTR_MOUNT_ROOT, stx_mode=S_IFDIR|S_ISVTX|0777,
> > stx_size=40, ...}) = 0
> > move_mount(4, "", AT_FDCWD, "/tmp", MOVE_MOUNT_F_EMPTY_PATH) = 0
> > ```
> >
> > That last call to "move_mount" is incorrectly interpreted by AppArmor
> > as an attempt to move-mount "/" to "/mnt" rather than as a new mount
> > being created, this therefore results in:
> > ```
> > Dec 03 01:05:03 kernel-test kernel: audit: type=1400
> > audit(1701565503.599:34): apparmor="DENIED" operation="mount"
> > class="mount" info="failed perms check" error=-13
> > profile="incus-a_</var/lib/incus>" name="/tmp/" pid=2190 comm="mount"
> > srcname="/" flags="rw, move"
> > ```
> >
> > Note that the flags here show "move", the fstype isn't even set and
> > the source path at srcname incorrectly shows "/".
> >
> > This operation therefore trips any container manager which has an
> > apparmor security policy preventing arbitrary move-mount (as those
> > could be used to bypass other apparmor path based policies).
> >
> >
> > The way I see it, the current mediation support effectively breaks any
> > attempt at mediating mounts in a useful way in apparmor as it's now
> > impossible to mediate new mounts based on their fstype or even
> > distinguish them from a move-mount operation.
> >
> Indeed it is a far from good solution. It is a stop gap.
> >
> > I don't know if this warrants pulling the mediation patch out of
> > stable (and out of linus' tree), obviously doing that would
> > reintroduce that hole in mount coverage, but at the same time, the
> > current coverage is broken enough that our only alternative is to
> > effectively allow all mounts, making the current mediation useless.
>
> pulling it effectively means ALL applications by-pass mediation, the
> alternative is to block all applications from using the move_mount
> system call as part of mediation. Which might have been acceptable
> as a stop gap when the syscall first landed but not now.

Pulling it from the stable branch may still make sense, you now have
folks who are updating to get actual bugfixes and end up with broken
containers, that doesn't exactly seem like a good outcome...

> The situation can be broken down like this
> unconfined applications have no restrictions, this doesn't affect
> hem.
>
> confined applications using the old mount interface continue to
> work as expected.
>
> confined applications using the new mount interface now are blocked
> unless there is, admittedly, a very broad mount rule granted. This
> is however better than the old situation in which confined
> applications where not mediated at all.

It depends from your point of view. If you are using apparmor around a
single application and have tailored rules for exactly what it is
doing, then yes, I agree.
But for all cases where apparmor is used as a safety net to block
potentially dangerous/problematic actions, this is causing the kind of
breakage which will cause people to just plain turn off apparmor.

> The regression is in the policy, where an application like LXD/Incus
> are specify mount rules and loading policy based on it. Those
> restrictions continue to work on the old mount interface, they
> however do not work as one would expect when the new mount interface
> is used. To allow applications to use the new mount interface
> a broad mount rule is needed, which basically makes the other
> mount restrictions useless.
>
> This situation is however still better than without the patch
> because, applications trying to use the new mount interface
> were not restricted at all. The only regression is in mediation
> of applications that are using the old mount interface. But
> again those application, without the patch, can just use the
> new interface and by-pass the restrictions from those rules. In
> this case the broad rule is not good, but better than the by-pass.
>
> I am working on a patch that will improve the situation, over the
> current patch using the existing move_mount hook. I wanted to
> be able to get more testing in place, before we did more, and
> just didn't have the time.

So just to understand, this change went upstream while it was known
that effectively no sane mount policy would work with it?

That seems very odd, at that point, why not instead upstream a patch
that disables the new mount API when apparmor is in use with mount
rules?
That feels like it would have had the same theoretical benefits
without the breakage that this change caused.

> But the reality is, apparmor won't be able to achieve equivalent
> mediation without new hooks or at least more mount context passed
> into the existing hook. That to is being looked at but won't be
> is even further out.

That doesn't sound encouraging as far as getting a suitable fix to our users...


Is there even a way in apparmor policies to cleanly prevent the use of
the new mount API (return ENOSYS back to userspace)?
We can obviously do this through a separate seccomp filter, but not
everyone is using seccomp...

Stéphane

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Apparmor move_mount mediation breaks mount tool in containers
  2023-12-04  1:34   ` Apparmor move_mount mediation breaks mount tool in containers Stéphane Graber
@ 2023-12-04 13:14     ` John Johansen
  2023-12-04 14:20       ` Linux regression tracking (Thorsten Leemhuis)
  2023-12-04 19:39       ` Sasha Levin
  2023-12-05 12:24     ` Linux regression tracking #adding (Thorsten Leemhuis)
  1 sibling, 2 replies; 17+ messages in thread
From: John Johansen @ 2023-12-04 13:14 UTC (permalink / raw)
  To: Stéphane Graber
  Cc: Christian Brauner, Sasha Levin, Aleksa Sarai,
	Alexander Mihalicyn, regressions

On 12/3/23 17:34, Stéphane Graber wrote:
> On Sun, Dec 3, 2023 at 8:21 PM John Johansen
> <john.johansen@canonical.com> wrote:
>>
>> On 12/2/23 17:20, Stéphane Graber wrote:
>>> Hey John,
>>>
>>> Upstream commit 157a3537d6bc28ceb9a11fc8cb67f2152d860146 which just
>>> landed in 6.6.3 stable as 96af45154a0be30485ad07f70f852b1456cb13d7 is
>>> blocking new mounts for all LXC, LXD and Incus users (at least) on
>>> distributions using the newer version of util-linux.
>>>
>>> That's because for a simple mount like "mount -t tmpfs tmpfs /tmp",
>>> the new mount command now performs:
>>> ```
>>> fsconfig(3, FSCONFIG_SET_STRING, "source", "tmpfs", 0) = 0
>>> fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = 0
>>> fsmount(3, FSMOUNT_CLOEXEC, 0)          = 4
>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_MNT_ID,
>>> {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID,
>>> stx_attributes=STATX_ATTR_MOUNT_ROOT, stx_mode=S_IFDIR|S_ISVTX|0777,
>>> stx_size=40, ...}) = 0
>>> move_mount(4, "", AT_FDCWD, "/tmp", MOVE_MOUNT_F_EMPTY_PATH) = 0
>>> ```
>>>
>>> That last call to "move_mount" is incorrectly interpreted by AppArmor
>>> as an attempt to move-mount "/" to "/mnt" rather than as a new mount
>>> being created, this therefore results in:
>>> ```
>>> Dec 03 01:05:03 kernel-test kernel: audit: type=1400
>>> audit(1701565503.599:34): apparmor="DENIED" operation="mount"
>>> class="mount" info="failed perms check" error=-13
>>> profile="incus-a_</var/lib/incus>" name="/tmp/" pid=2190 comm="mount"
>>> srcname="/" flags="rw, move"
>>> ```
>>>
>>> Note that the flags here show "move", the fstype isn't even set and
>>> the source path at srcname incorrectly shows "/".
>>>
>>> This operation therefore trips any container manager which has an
>>> apparmor security policy preventing arbitrary move-mount (as those
>>> could be used to bypass other apparmor path based policies).
>>>
>>>
>>> The way I see it, the current mediation support effectively breaks any
>>> attempt at mediating mounts in a useful way in apparmor as it's now
>>> impossible to mediate new mounts based on their fstype or even
>>> distinguish them from a move-mount operation.
>>>
>> Indeed it is a far from good solution. It is a stop gap.
>>>
>>> I don't know if this warrants pulling the mediation patch out of
>>> stable (and out of linus' tree), obviously doing that would
>>> reintroduce that hole in mount coverage, but at the same time, the
>>> current coverage is broken enough that our only alternative is to
>>> effectively allow all mounts, making the current mediation useless.
>>
>> pulling it effectively means ALL applications by-pass mediation, the
>> alternative is to block all applications from using the move_mount
>> system call as part of mediation. Which might have been acceptable
>> as a stop gap when the syscall first landed but not now.
> 
> Pulling it from the stable branch may still make sense, you now have
> folks who are updating to get actual bugfixes and end up with broken
> containers, that doesn't exactly seem like a good outcome...
> 

I will defer such a decision to the maintainers the stable trees. I
can see arguments either way.

But I will note that even if we had been able perfectly match the
new interface mediation to the old breakage is still possible because
containers could have been using new interface to do things not
covered by the old interface rules.

Not that it matters to the upstream kernel maintainers but a CVE
is coming for this and that having this unmediated in the tree isn't
good either.

>> The situation can be broken down like this
>> unconfined applications have no restrictions, this doesn't affect
>> hem.
>>
>> confined applications using the old mount interface continue to
>> work as expected.
>>
>> confined applications using the new mount interface now are blocked
>> unless there is, admittedly, a very broad mount rule granted. This
>> is however better than the old situation in which confined
>> applications where not mediated at all.
> 
> It depends from your point of view. If you are using apparmor around a
> single application and have tailored rules for exactly what it is
> doing, then yes, I agree.
> But for all cases where apparmor is used as a safety net to block
> potentially dangerous/problematic actions, this is causing the kind of
> breakage which will cause people to just plain turn off apparmor.
> 

Only in the sense that closing a security regression causes breakage.

Lets cover the cases.
Policy is not updated, and application uses the old mount interface:
this continues to work.

*Policy is not updated, and application uses the new mount interface:
the access is likely blocked (the security hole has been closed)
unless the policy already allowed a generic
   allow mount,

+Policy is updated with generic
   allow mount,
and application uses the old mount interface:
this continues to work, security restrictions are reduced on the
old interface, but policy reflects what can actually be enforced
around mount, because if the application is compromised and uses
the new mount interface this is what can be mediated.

Policy is updated with generic
   allow mount,
and application uses the new mount interface:
The application continues to work.

* The breakage case is where the application was using the new
interface and not being mediated. While being able to mediate
the new interface to the same level of the old would help, it
does not guarantee this case won't break, as the application
could be using to do things that are not restricted by current
policy.

+ The regressed mediation case, is only sort of reduced mediation.
The policy can maintain all existing mount rules so that info
is not lost going forward.
The mediation on the old mount interface is reduced by the new
mount rule, but at the same time it now possible to turn of
the new mount interface, if mount is not being used by the
application.

>> The regression is in the policy, where an application like LXD/Incus
>> are specify mount rules and loading policy based on it. Those
>> restrictions continue to work on the old mount interface, they
>> however do not work as one would expect when the new mount interface
>> is used. To allow applications to use the new mount interface
>> a broad mount rule is needed, which basically makes the other
>> mount restrictions useless.
>>
>> This situation is however still better than without the patch
>> because, applications trying to use the new mount interface
>> were not restricted at all. The only regression is in mediation
>> of applications that are using the old mount interface. But
>> again those application, without the patch, can just use the
>> new interface and by-pass the restrictions from those rules. In
>> this case the broad rule is not good, but better than the by-pass.
>>
>> I am working on a patch that will improve the situation, over the
>> current patch using the existing move_mount hook. I wanted to
>> be able to get more testing in place, before we did more, and
>> just didn't have the time.
> 
> So just to understand, this change went upstream while it was known
> that effectively no sane mount policy would work with it?
> 

having a toggle to turn off mount is sane. It actually is a little
better than that in that you can do more with some cases. It still
allows full mediation under the old interface, it just can't match
how fine the mediation is with the old interface.

> That seems very odd, at that point, why not instead upstream a patch
> that disables the new mount API when apparmor is in use with mount
> rules?

It breaks applications, some will fallback, some don't. Here you
get a policy controllable toggle. Admittedly it returns -EACCES
instead of a different error code but that could be adjusted.

> That feels like it would have had the same theoretical benefits
> without the breakage that this change caused.
> 
I am willing to experiment with this more but it can break
applications.

>> But the reality is, apparmor won't be able to achieve equivalent
>> mediation without new hooks or at least more mount context passed
>> into the existing hook. That to is being looked at but won't be
>> is even further out.
> 
> That doesn't sound encouraging as far as getting a suitable fix to our users...
> 
If by suitable fix you mean equivalence of mediation between the
old and new interface, not in the short term.

> 
> Is there even a way in apparmor policies to cleanly prevent the use of
> the new mount API (return ENOSYS back to userspace)?

Its possible, it will take another patch or two kernel side for this
case. We could even add a boolean and make it so you can apply mount
rules specifically only to the old or new interface.

> We can obviously do this through a separate seccomp filter, but not
> everyone is using seccomp...
> 
> Stéphane


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Apparmor move_mount mediation breaks mount tool in containers
  2023-12-04 13:14     ` John Johansen
@ 2023-12-04 14:20       ` Linux regression tracking (Thorsten Leemhuis)
  2023-12-05  6:57         ` Stéphane Graber
  2023-12-04 19:39       ` Sasha Levin
  1 sibling, 1 reply; 17+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2023-12-04 14:20 UTC (permalink / raw)
  To: John Johansen, Stéphane Graber
  Cc: Christian Brauner, Sasha Levin, Aleksa Sarai,
	Alexander Mihalicyn, regressions

On 04.12.23 14:14, John Johansen wrote:
> On 12/3/23 17:34, Stéphane Graber wrote:

Side note: Stéphane, thx for CCing the regressions list.

>> On Sun, Dec 3, 2023 at 8:21 PM John Johansen
>> <john.johansen@canonical.com> wrote:
>>> On 12/2/23 17:20, Stéphane Graber wrote:
>>>>
>>>> Upstream commit 157a3537d6bc28ceb9a11fc8cb67f2152d860146 which just
>>>> landed in 6.6.3 stable as 96af45154a0be30485ad07f70f852b1456cb13d7 is
>>>> blocking new mounts for all LXC, LXD and Incus users (at least) on
>>>> distributions using the newer version of util-linux.
>>>>
>>>> That's because for a simple mount like "mount -t tmpfs tmpfs /tmp",
>>>> the new mount command now performs:
>>>> ```
>>>> fsconfig(3, FSCONFIG_SET_STRING, "source", "tmpfs", 0) = 0
>>>> fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = 0
>>>> fsmount(3, FSMOUNT_CLOEXEC, 0)          = 4
>>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_MNT_ID,
>>>> {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID,
>>>> stx_attributes=STATX_ATTR_MOUNT_ROOT, stx_mode=S_IFDIR|S_ISVTX|0777,
>>>> stx_size=40, ...}) = 0
>>>> move_mount(4, "", AT_FDCWD, "/tmp", MOVE_MOUNT_F_EMPTY_PATH) = 0
>>>> ```
>>>>
>>>> That last call to "move_mount" is incorrectly interpreted by AppArmor
>>>> as an attempt to move-mount "/" to "/mnt" rather than as a new mount
>>>> being created, this therefore results in:
>>>> ```
>>>> Dec 03 01:05:03 kernel-test kernel: audit: type=1400
>>>> audit(1701565503.599:34): apparmor="DENIED" operation="mount"
>>>> class="mount" info="failed perms check" error=-13
>>>> profile="incus-a_</var/lib/incus>" name="/tmp/" pid=2190 comm="mount"
>>>> srcname="/" flags="rw, move"
>>>> ```
>>>>
>>>> Note that the flags here show "move", the fstype isn't even set and
>>>> the source path at srcname incorrectly shows "/".
>>>>
>>>> This operation therefore trips any container manager which has an
>>>> apparmor security policy preventing arbitrary move-mount (as those
>>>> could be used to bypass other apparmor path based policies).
>>>>
>>>>
>>>> The way I see it, the current mediation support effectively breaks any
>>>> attempt at mediating mounts in a useful way in apparmor as it's now
>>>> impossible to mediate new mounts based on their fstype or even
>>>> distinguish them from a move-mount operation.
>>>>
>>> Indeed it is a far from good solution. It is a stop gap.
>>>>
>>>> I don't know if this warrants pulling the mediation patch out of
>>>> stable (and out of linus' tree), obviously doing that would
>>>> reintroduce that hole in mount coverage, but at the same time, the
>>>> current coverage is broken enough that our only alternative is to
>>>> effectively allow all mounts, making the current mediation useless.
>>>
>>> pulling it effectively means ALL applications by-pass mediation, the
>>> alternative is to block all applications from using the move_mount
>>> system call as part of mediation. Which might have been acceptable
>>> as a stop gap when the syscall first landed but not now.
>>
>> Pulling it from the stable branch may still make sense, you now have
>> folks who are updating to get actual bugfixes and end up with broken
>> containers, that doesn't exactly seem like a good outcome...
> 
> I will defer such a decision to the maintainers the stable trees. I
> can see arguments either way.

FWIW, Greg usually does not revert a backported change if the change
causes the same problem to happen with mainline, as then it should be
fixed there as well. Which is normally the right thing to do, as Linus
wants people to be able to upgrade to new kernels without having them to
upgrade anything else (firmware, anything in userland incl. apparmor
policies).

So normally this whole issue afaics would be "157a3537d6bc28 needs to be
fixed in mainline (if nothing else helps with a revert), and that fix
then needs to be backported to the various stable trees".

It obviously is more complicated because that commit apparently fixes a
security vulnerability. But even under such circumstances Linus afaik
wants us to do everything possible to avoid breaking peoples workflows.
Which is why I wonder if there is a more elegant solution here
somewhere. I doubt that the answer is yes, but I'll ask the following
anyway: Would a kernel config option that distros could set when they
updated their Apparmor policies help here? Or something in sysfs?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

> But I will note that even if we had been able perfectly match the
> new interface mediation to the old breakage is still possible because
> containers could have been using new interface to do things not
> covered by the old interface rules.
> 
> Not that it matters to the upstream kernel maintainers but a CVE
> is coming for this and that having this unmediated in the tree isn't
> good either.
> 
>>> The situation can be broken down like this
>>> unconfined applications have no restrictions, this doesn't affect
>>> hem.
>>>
>>> confined applications using the old mount interface continue to
>>> work as expected.
>>>
>>> confined applications using the new mount interface now are blocked
>>> unless there is, admittedly, a very broad mount rule granted. This
>>> is however better than the old situation in which confined
>>> applications where not mediated at all.
>>
>> It depends from your point of view. If you are using apparmor around a
>> single application and have tailored rules for exactly what it is
>> doing, then yes, I agree.
>> But for all cases where apparmor is used as a safety net to block
>> potentially dangerous/problematic actions, this is causing the kind of
>> breakage which will cause people to just plain turn off apparmor.
>>
> 
> Only in the sense that closing a security regression causes breakage.
> 
> Lets cover the cases.
> Policy is not updated, and application uses the old mount interface:
> this continues to work.
> 
> *Policy is not updated, and application uses the new mount interface:
> the access is likely blocked (the security hole has been closed)
> unless the policy already allowed a generic
>   allow mount,
> 
> +Policy is updated with generic
>   allow mount,
> and application uses the old mount interface:
> this continues to work, security restrictions are reduced on the
> old interface, but policy reflects what can actually be enforced
> around mount, because if the application is compromised and uses
> the new mount interface this is what can be mediated.
> 
> Policy is updated with generic
>   allow mount,
> and application uses the new mount interface:
> The application continues to work.
> 
> * The breakage case is where the application was using the new
> interface and not being mediated. While being able to mediate
> the new interface to the same level of the old would help, it
> does not guarantee this case won't break, as the application
> could be using to do things that are not restricted by current
> policy.
> 
> + The regressed mediation case, is only sort of reduced mediation.
> The policy can maintain all existing mount rules so that info
> is not lost going forward.
> The mediation on the old mount interface is reduced by the new
> mount rule, but at the same time it now possible to turn of
> the new mount interface, if mount is not being used by the
> application.
> 
>>> The regression is in the policy, where an application like LXD/Incus
>>> are specify mount rules and loading policy based on it. Those
>>> restrictions continue to work on the old mount interface, they
>>> however do not work as one would expect when the new mount interface
>>> is used. To allow applications to use the new mount interface
>>> a broad mount rule is needed, which basically makes the other
>>> mount restrictions useless.
>>>
>>> This situation is however still better than without the patch
>>> because, applications trying to use the new mount interface
>>> were not restricted at all. The only regression is in mediation
>>> of applications that are using the old mount interface. But
>>> again those application, without the patch, can just use the
>>> new interface and by-pass the restrictions from those rules. In
>>> this case the broad rule is not good, but better than the by-pass.
>>>
>>> I am working on a patch that will improve the situation, over the
>>> current patch using the existing move_mount hook. I wanted to
>>> be able to get more testing in place, before we did more, and
>>> just didn't have the time.
>>
>> So just to understand, this change went upstream while it was known
>> that effectively no sane mount policy would work with it?
>>
> 
> having a toggle to turn off mount is sane. It actually is a little
> better than that in that you can do more with some cases. It still
> allows full mediation under the old interface, it just can't match
> how fine the mediation is with the old interface.
> 
>> That seems very odd, at that point, why not instead upstream a patch
>> that disables the new mount API when apparmor is in use with mount
>> rules?
> 
> It breaks applications, some will fallback, some don't. Here you
> get a policy controllable toggle. Admittedly it returns -EACCES
> instead of a different error code but that could be adjusted.
> 
>> That feels like it would have had the same theoretical benefits
>> without the breakage that this change caused.
>>
> I am willing to experiment with this more but it can break
> applications.
> 
>>> But the reality is, apparmor won't be able to achieve equivalent
>>> mediation without new hooks or at least more mount context passed
>>> into the existing hook. That to is being looked at but won't be
>>> is even further out.
>>
>> That doesn't sound encouraging as far as getting a suitable fix to our
>> users...
>>
> If by suitable fix you mean equivalence of mediation between the
> old and new interface, not in the short term.
> 
>>
>> Is there even a way in apparmor policies to cleanly prevent the use of
>> the new mount API (return ENOSYS back to userspace)?
> 
> Its possible, it will take another patch or two kernel side for this
> case. We could even add a boolean and make it so you can apply mount
> rules specifically only to the old or new interface.
> 
>> We can obviously do this through a separate seccomp filter, but not
>> everyone is using seccomp...
>>
>> Stéphane
> 
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Apparmor move_mount mediation breaks mount tool in containers
  2023-12-04 13:14     ` John Johansen
  2023-12-04 14:20       ` Linux regression tracking (Thorsten Leemhuis)
@ 2023-12-04 19:39       ` Sasha Levin
  2023-12-04 20:35         ` John Johansen
  1 sibling, 1 reply; 17+ messages in thread
From: Sasha Levin @ 2023-12-04 19:39 UTC (permalink / raw)
  To: John Johansen
  Cc: Stéphane Graber, Christian Brauner, Aleksa Sarai,
	Alexander Mihalicyn, regressions

On Mon, Dec 04, 2023 at 05:14:11AM -0800, John Johansen wrote:
>On 12/3/23 17:34, Stéphane Graber wrote:
>>On Sun, Dec 3, 2023 at 8:21 PM John Johansen
>><john.johansen@canonical.com> wrote:
>>>
>>>On 12/2/23 17:20, Stéphane Graber wrote:
>>>>Hey John,
>>>>
>>>>Upstream commit 157a3537d6bc28ceb9a11fc8cb67f2152d860146 which just
>>>>landed in 6.6.3 stable as 96af45154a0be30485ad07f70f852b1456cb13d7 is
>>>>blocking new mounts for all LXC, LXD and Incus users (at least) on
>>>>distributions using the newer version of util-linux.
>>>>
>>>>That's because for a simple mount like "mount -t tmpfs tmpfs /tmp",
>>>>the new mount command now performs:
>>>>```
>>>>fsconfig(3, FSCONFIG_SET_STRING, "source", "tmpfs", 0) = 0
>>>>fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = 0
>>>>fsmount(3, FSMOUNT_CLOEXEC, 0)          = 4
>>>>statx(4, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_MNT_ID,
>>>>{stx_mask=STATX_BASIC_STATS|STATX_MNT_ID,
>>>>stx_attributes=STATX_ATTR_MOUNT_ROOT, stx_mode=S_IFDIR|S_ISVTX|0777,
>>>>stx_size=40, ...}) = 0
>>>>move_mount(4, "", AT_FDCWD, "/tmp", MOVE_MOUNT_F_EMPTY_PATH) = 0
>>>>```
>>>>
>>>>That last call to "move_mount" is incorrectly interpreted by AppArmor
>>>>as an attempt to move-mount "/" to "/mnt" rather than as a new mount
>>>>being created, this therefore results in:
>>>>```
>>>>Dec 03 01:05:03 kernel-test kernel: audit: type=1400
>>>>audit(1701565503.599:34): apparmor="DENIED" operation="mount"
>>>>class="mount" info="failed perms check" error=-13
>>>>profile="incus-a_</var/lib/incus>" name="/tmp/" pid=2190 comm="mount"
>>>>srcname="/" flags="rw, move"
>>>>```
>>>>
>>>>Note that the flags here show "move", the fstype isn't even set and
>>>>the source path at srcname incorrectly shows "/".
>>>>
>>>>This operation therefore trips any container manager which has an
>>>>apparmor security policy preventing arbitrary move-mount (as those
>>>>could be used to bypass other apparmor path based policies).
>>>>
>>>>
>>>>The way I see it, the current mediation support effectively breaks any
>>>>attempt at mediating mounts in a useful way in apparmor as it's now
>>>>impossible to mediate new mounts based on their fstype or even
>>>>distinguish them from a move-mount operation.
>>>>
>>>Indeed it is a far from good solution. It is a stop gap.
>>>>
>>>>I don't know if this warrants pulling the mediation patch out of
>>>>stable (and out of linus' tree), obviously doing that would
>>>>reintroduce that hole in mount coverage, but at the same time, the
>>>>current coverage is broken enough that our only alternative is to
>>>>effectively allow all mounts, making the current mediation useless.
>>>
>>>pulling it effectively means ALL applications by-pass mediation, the
>>>alternative is to block all applications from using the move_mount
>>>system call as part of mediation. Which might have been acceptable
>>>as a stop gap when the syscall first landed but not now.
>>
>>Pulling it from the stable branch may still make sense, you now have
>>folks who are updating to get actual bugfixes and end up with broken
>>containers, that doesn't exactly seem like a good outcome...
>>
>
>I will defer such a decision to the maintainers the stable trees. I
>can see arguments either way.

This decision isn't up to the stable tree: we want to stay "bug
compatible" with upstream, and thus if this issue also exists upstream
(and it sounds like it does), we'll keep this patch in the stable trees.

If a revert was to be submitted to Linus, we'll take the same revert to
the stable trees.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Apparmor move_mount mediation breaks mount tool in containers
  2023-12-04 19:39       ` Sasha Levin
@ 2023-12-04 20:35         ` John Johansen
  0 siblings, 0 replies; 17+ messages in thread
From: John Johansen @ 2023-12-04 20:35 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Stéphane Graber, Christian Brauner, Aleksa Sarai,
	Alexander Mihalicyn, regressions

On 12/4/23 11:39, Sasha Levin wrote:
> On Mon, Dec 04, 2023 at 05:14:11AM -0800, John Johansen wrote:
>> On 12/3/23 17:34, Stéphane Graber wrote:
>>> On Sun, Dec 3, 2023 at 8:21 PM John Johansen
>>> <john.johansen@canonical.com> wrote:
>>>>
>>>> On 12/2/23 17:20, Stéphane Graber wrote:
>>>>> Hey John,
>>>>>
>>>>> Upstream commit 157a3537d6bc28ceb9a11fc8cb67f2152d860146 which just
>>>>> landed in 6.6.3 stable as 96af45154a0be30485ad07f70f852b1456cb13d7 is
>>>>> blocking new mounts for all LXC, LXD and Incus users (at least) on
>>>>> distributions using the newer version of util-linux.
>>>>>
>>>>> That's because for a simple mount like "mount -t tmpfs tmpfs /tmp",
>>>>> the new mount command now performs:
>>>>> ```
>>>>> fsconfig(3, FSCONFIG_SET_STRING, "source", "tmpfs", 0) = 0
>>>>> fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = 0
>>>>> fsmount(3, FSMOUNT_CLOEXEC, 0)          = 4
>>>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_MNT_ID,
>>>>> {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID,
>>>>> stx_attributes=STATX_ATTR_MOUNT_ROOT, stx_mode=S_IFDIR|S_ISVTX|0777,
>>>>> stx_size=40, ...}) = 0
>>>>> move_mount(4, "", AT_FDCWD, "/tmp", MOVE_MOUNT_F_EMPTY_PATH) = 0
>>>>> ```
>>>>>
>>>>> That last call to "move_mount" is incorrectly interpreted by AppArmor
>>>>> as an attempt to move-mount "/" to "/mnt" rather than as a new mount
>>>>> being created, this therefore results in:
>>>>> ```
>>>>> Dec 03 01:05:03 kernel-test kernel: audit: type=1400
>>>>> audit(1701565503.599:34): apparmor="DENIED" operation="mount"
>>>>> class="mount" info="failed perms check" error=-13
>>>>> profile="incus-a_</var/lib/incus>" name="/tmp/" pid=2190 comm="mount"
>>>>> srcname="/" flags="rw, move"
>>>>> ```
>>>>>
>>>>> Note that the flags here show "move", the fstype isn't even set and
>>>>> the source path at srcname incorrectly shows "/".
>>>>>
>>>>> This operation therefore trips any container manager which has an
>>>>> apparmor security policy preventing arbitrary move-mount (as those
>>>>> could be used to bypass other apparmor path based policies).
>>>>>
>>>>>
>>>>> The way I see it, the current mediation support effectively breaks any
>>>>> attempt at mediating mounts in a useful way in apparmor as it's now
>>>>> impossible to mediate new mounts based on their fstype or even
>>>>> distinguish them from a move-mount operation.
>>>>>
>>>> Indeed it is a far from good solution. It is a stop gap.
>>>>>
>>>>> I don't know if this warrants pulling the mediation patch out of
>>>>> stable (and out of linus' tree), obviously doing that would
>>>>> reintroduce that hole in mount coverage, but at the same time, the
>>>>> current coverage is broken enough that our only alternative is to
>>>>> effectively allow all mounts, making the current mediation useless.
>>>>
>>>> pulling it effectively means ALL applications by-pass mediation, the
>>>> alternative is to block all applications from using the move_mount
>>>> system call as part of mediation. Which might have been acceptable
>>>> as a stop gap when the syscall first landed but not now.
>>>
>>> Pulling it from the stable branch may still make sense, you now have
>>> folks who are updating to get actual bugfixes and end up with broken
>>> containers, that doesn't exactly seem like a good outcome...
>>>
>>
>> I will defer such a decision to the maintainers the stable trees. I
>> can see arguments either way.
> 
> This decision isn't up to the stable tree: we want to stay "bug
> compatible" with upstream, and thus if this issue also exists upstream
> (and it sounds like it does), we'll keep this patch in the stable trees.
> 
> If a revert was to be submitted to Linus, we'll take the same revert to
> the stable trees.
> 
ah, thanks for he clarification Sasha


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Apparmor move_mount mediation breaks mount tool in containers
  2023-12-04 14:20       ` Linux regression tracking (Thorsten Leemhuis)
@ 2023-12-05  6:57         ` Stéphane Graber
  2023-12-05  8:45           ` Linux regression tracking (Thorsten Leemhuis)
  2023-12-05 19:55           ` John Johansen
  0 siblings, 2 replies; 17+ messages in thread
From: Stéphane Graber @ 2023-12-05  6:57 UTC (permalink / raw)
  To: Linux regressions mailing list
  Cc: John Johansen, Christian Brauner, Sasha Levin, Aleksa Sarai,
	Alexander Mihalicyn, paul, James Morris, Serge Hallyn,
	linux-security-module

On Mon, Dec 4, 2023 at 9:20 AM Linux regression tracking (Thorsten
Leemhuis) <regressions@leemhuis.info> wrote:
>
> On 04.12.23 14:14, John Johansen wrote:
> > On 12/3/23 17:34, Stéphane Graber wrote:
>
> Side note: Stéphane, thx for CCing the regressions list.
>
> >> On Sun, Dec 3, 2023 at 8:21 PM John Johansen
> >> <john.johansen@canonical.com> wrote:
> >>> On 12/2/23 17:20, Stéphane Graber wrote:
> >>>>
> >>>> Upstream commit 157a3537d6bc28ceb9a11fc8cb67f2152d860146 which just
> >>>> landed in 6.6.3 stable as 96af45154a0be30485ad07f70f852b1456cb13d7 is
> >>>> blocking new mounts for all LXC, LXD and Incus users (at least) on
> >>>> distributions using the newer version of util-linux.
> >>>>
> >>>> That's because for a simple mount like "mount -t tmpfs tmpfs /tmp",
> >>>> the new mount command now performs:
> >>>> ```
> >>>> fsconfig(3, FSCONFIG_SET_STRING, "source", "tmpfs", 0) = 0
> >>>> fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = 0
> >>>> fsmount(3, FSMOUNT_CLOEXEC, 0)          = 4
> >>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_MNT_ID,
> >>>> {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID,
> >>>> stx_attributes=STATX_ATTR_MOUNT_ROOT, stx_mode=S_IFDIR|S_ISVTX|0777,
> >>>> stx_size=40, ...}) = 0
> >>>> move_mount(4, "", AT_FDCWD, "/tmp", MOVE_MOUNT_F_EMPTY_PATH) = 0
> >>>> ```
> >>>>
> >>>> That last call to "move_mount" is incorrectly interpreted by AppArmor
> >>>> as an attempt to move-mount "/" to "/mnt" rather than as a new mount
> >>>> being created, this therefore results in:
> >>>> ```
> >>>> Dec 03 01:05:03 kernel-test kernel: audit: type=1400
> >>>> audit(1701565503.599:34): apparmor="DENIED" operation="mount"
> >>>> class="mount" info="failed perms check" error=-13
> >>>> profile="incus-a_</var/lib/incus>" name="/tmp/" pid=2190 comm="mount"
> >>>> srcname="/" flags="rw, move"
> >>>> ```
> >>>>
> >>>> Note that the flags here show "move", the fstype isn't even set and
> >>>> the source path at srcname incorrectly shows "/".
> >>>>
> >>>> This operation therefore trips any container manager which has an
> >>>> apparmor security policy preventing arbitrary move-mount (as those
> >>>> could be used to bypass other apparmor path based policies).
> >>>>
> >>>>
> >>>> The way I see it, the current mediation support effectively breaks any
> >>>> attempt at mediating mounts in a useful way in apparmor as it's now
> >>>> impossible to mediate new mounts based on their fstype or even
> >>>> distinguish them from a move-mount operation.
> >>>>
> >>> Indeed it is a far from good solution. It is a stop gap.
> >>>>
> >>>> I don't know if this warrants pulling the mediation patch out of
> >>>> stable (and out of linus' tree), obviously doing that would
> >>>> reintroduce that hole in mount coverage, but at the same time, the
> >>>> current coverage is broken enough that our only alternative is to
> >>>> effectively allow all mounts, making the current mediation useless.
> >>>
> >>> pulling it effectively means ALL applications by-pass mediation, the
> >>> alternative is to block all applications from using the move_mount
> >>> system call as part of mediation. Which might have been acceptable
> >>> as a stop gap when the syscall first landed but not now.
> >>
> >> Pulling it from the stable branch may still make sense, you now have
> >> folks who are updating to get actual bugfixes and end up with broken
> >> containers, that doesn't exactly seem like a good outcome...
> >
> > I will defer such a decision to the maintainers the stable trees. I
> > can see arguments either way.
>
> FWIW, Greg usually does not revert a backported change if the change
> causes the same problem to happen with mainline, as then it should be
> fixed there as well. Which is normally the right thing to do, as Linus
> wants people to be able to upgrade to new kernels without having them to
> upgrade anything else (firmware, anything in userland incl. apparmor
> policies).
>
> So normally this whole issue afaics would be "157a3537d6bc28 needs to be
> fixed in mainline (if nothing else helps with a revert), and that fix
> then needs to be backported to the various stable trees".
>
> It obviously is more complicated because that commit apparently fixes a
> security vulnerability. But even under such circumstances Linus afaik
> wants us to do everything possible to avoid breaking peoples workflows.
> Which is why I wonder if there is a more elegant solution here
> somewhere. I doubt that the answer is yes, but I'll ask the following
> anyway: Would a kernel config option that distros could set when they
> updated their Apparmor policies help here? Or something in sysfs?
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

So there are a few different scenarios here:

1) Distribution shipping application specific profiles that were
developed back when the new VFS mount API wasn't around, those
profiles allow the specific mount calls made by the application. The
application has since moved on to using the VFS mount API and thanks
to the lack of apparmor coverage, things have kept on working fine.
Now when the distro pulls this bugfix, the application will now fail
to perform the mount. Apparmor returns EACCESS which doesn't cause a
fallback to the old mount API but instead a straight up error returned
to the user. ENOSYS would usually be handled properly.

2) All the container managers out there which use AppArmor as a safety
net to block mostly misbehaving or misconfigured software. Those
policies are not meant to be water tight to begin with (and due to the
nature of containers, they really can't be). But they do catch a lot
and while not an actual security barrier, they certainly prevent a lot
of accidents. Unfortunately with the new mediation in place, the only
real way to operate moving forward is to not mediate mounts at all as
mediation of the new VFS API leads to just about everything getting
denied and apparmor doesn't provide a way to separately mediate the
old and new mount APIs.

3) A system where apparmor is used to effectively prevent any mounting
whatsoever, where the policy is now being bypassed by simply using the
new VFS API. Such a system would also need to not be using Seccomp in
combination with Apparmor as blocking the new VFS API syscalls would
avoid the issue.


I suspect 3) is what John is hinting at in this thread. I could
certainly see this bug be used to bypass the snapd mount restrictions
for example, though snapd also generates seccomp policies, so just
blocking the new VFS API would be a much simpler fix there.

I have no idea how common 1) is, but it looks like we're about to find
out soon and that has potential for some very "interesting" bug
reports as mount related errors are usually not the easiest to
understand and LSMs make them so much worse.

As for 2), as it stands, we're going to have to effectively turn off
any of the mount related safety nets we had in place in LXC, Incus and
likely most other container runtimes out there to handle this. The
usual issue with doing that kind of heavy hammer change is that once
it's done, it will take a long time to undo it. That is, once AppArmor
is actually capable of mediating the way it's supposed to, a lot of
the profiles will have been changed to just blanket allow all mounts
and they're only ever going to be changed once we're confident that
the vast majority of our users are running a kernel with the fixed
logic.

I guess some of that could be handled better if there was a way to
detect the current broken mediation of the new mount API and then
later again, detect that the kernel now has working mediation,
allowing for those container managers to generate accurate profile
rather than have to go for the "safest" option (as far as keeping
users running) and just keep allowing everything.

John, is there any such detection mechanism currently present that
could be used by userspace to better handle this situation?


Overall, this still feels very rushed and broken to me. I understand
that being able to trivially bypass AppArmor mount rules isn't great,
but shipping code that makes the vast majority of said rules useless
doesn't really feel like such an improvement. It's effectively turning
what was a reasonably flexible policy engine around mount calls, into
a binary switch to either allow everything or block everything.

We at the very least need a way to check whether we're dealing with
this known broken state so that any change that's made to userspace
can be made condition on this, ensuring that once mediation actually
works as intended, those policies also go back to the way they're
supposed to be.

Stéphane

> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
>
> > But I will note that even if we had been able perfectly match the
> > new interface mediation to the old breakage is still possible because
> > containers could have been using new interface to do things not
> > covered by the old interface rules.
> >
> > Not that it matters to the upstream kernel maintainers but a CVE
> > is coming for this and that having this unmediated in the tree isn't
> > good either.
> >
> >>> The situation can be broken down like this
> >>> unconfined applications have no restrictions, this doesn't affect
> >>> hem.
> >>>
> >>> confined applications using the old mount interface continue to
> >>> work as expected.
> >>>
> >>> confined applications using the new mount interface now are blocked
> >>> unless there is, admittedly, a very broad mount rule granted. This
> >>> is however better than the old situation in which confined
> >>> applications where not mediated at all.
> >>
> >> It depends from your point of view. If you are using apparmor around a
> >> single application and have tailored rules for exactly what it is
> >> doing, then yes, I agree.
> >> But for all cases where apparmor is used as a safety net to block
> >> potentially dangerous/problematic actions, this is causing the kind of
> >> breakage which will cause people to just plain turn off apparmor.
> >>
> >
> > Only in the sense that closing a security regression causes breakage.
> >
> > Lets cover the cases.
> > Policy is not updated, and application uses the old mount interface:
> > this continues to work.
> >
> > *Policy is not updated, and application uses the new mount interface:
> > the access is likely blocked (the security hole has been closed)
> > unless the policy already allowed a generic
> >   allow mount,
> >
> > +Policy is updated with generic
> >   allow mount,
> > and application uses the old mount interface:
> > this continues to work, security restrictions are reduced on the
> > old interface, but policy reflects what can actually be enforced
> > around mount, because if the application is compromised and uses
> > the new mount interface this is what can be mediated.
> >
> > Policy is updated with generic
> >   allow mount,
> > and application uses the new mount interface:
> > The application continues to work.
> >
> > * The breakage case is where the application was using the new
> > interface and not being mediated. While being able to mediate
> > the new interface to the same level of the old would help, it
> > does not guarantee this case won't break, as the application
> > could be using to do things that are not restricted by current
> > policy.
> >
> > + The regressed mediation case, is only sort of reduced mediation.
> > The policy can maintain all existing mount rules so that info
> > is not lost going forward.
> > The mediation on the old mount interface is reduced by the new
> > mount rule, but at the same time it now possible to turn of
> > the new mount interface, if mount is not being used by the
> > application.
> >
> >>> The regression is in the policy, where an application like LXD/Incus
> >>> are specify mount rules and loading policy based on it. Those
> >>> restrictions continue to work on the old mount interface, they
> >>> however do not work as one would expect when the new mount interface
> >>> is used. To allow applications to use the new mount interface
> >>> a broad mount rule is needed, which basically makes the other
> >>> mount restrictions useless.
> >>>
> >>> This situation is however still better than without the patch
> >>> because, applications trying to use the new mount interface
> >>> were not restricted at all. The only regression is in mediation
> >>> of applications that are using the old mount interface. But
> >>> again those application, without the patch, can just use the
> >>> new interface and by-pass the restrictions from those rules. In
> >>> this case the broad rule is not good, but better than the by-pass.
> >>>
> >>> I am working on a patch that will improve the situation, over the
> >>> current patch using the existing move_mount hook. I wanted to
> >>> be able to get more testing in place, before we did more, and
> >>> just didn't have the time.
> >>
> >> So just to understand, this change went upstream while it was known
> >> that effectively no sane mount policy would work with it?
> >>
> >
> > having a toggle to turn off mount is sane. It actually is a little
> > better than that in that you can do more with some cases. It still
> > allows full mediation under the old interface, it just can't match
> > how fine the mediation is with the old interface.
> >
> >> That seems very odd, at that point, why not instead upstream a patch
> >> that disables the new mount API when apparmor is in use with mount
> >> rules?
> >
> > It breaks applications, some will fallback, some don't. Here you
> > get a policy controllable toggle. Admittedly it returns -EACCES
> > instead of a different error code but that could be adjusted.
> >
> >> That feels like it would have had the same theoretical benefits
> >> without the breakage that this change caused.
> >>
> > I am willing to experiment with this more but it can break
> > applications.
> >
> >>> But the reality is, apparmor won't be able to achieve equivalent
> >>> mediation without new hooks or at least more mount context passed
> >>> into the existing hook. That to is being looked at but won't be
> >>> is even further out.
> >>
> >> That doesn't sound encouraging as far as getting a suitable fix to our
> >> users...
> >>
> > If by suitable fix you mean equivalence of mediation between the
> > old and new interface, not in the short term.
> >
> >>
> >> Is there even a way in apparmor policies to cleanly prevent the use of
> >> the new mount API (return ENOSYS back to userspace)?
> >
> > Its possible, it will take another patch or two kernel side for this
> > case. We could even add a boolean and make it so you can apply mount
> > rules specifically only to the old or new interface.
> >
> >> We can obviously do this through a separate seccomp filter, but not
> >> everyone is using seccomp...
> >>
> >> Stéphane

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Apparmor move_mount mediation breaks mount tool in containers
  2023-12-05  6:57         ` Stéphane Graber
@ 2023-12-05  8:45           ` Linux regression tracking (Thorsten Leemhuis)
  2023-12-05 17:08             ` Christian Brauner
  2023-12-05 19:55           ` John Johansen
  1 sibling, 1 reply; 17+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2023-12-05  8:45 UTC (permalink / raw)
  To: Stéphane Graber, Linux regressions mailing list
  Cc: John Johansen, Christian Brauner, Sasha Levin, Aleksa Sarai,
	Alexander Mihalicyn, paul, James Morris, Serge Hallyn,
	linux-security-module, Linus Torvalds

[CCing Linus]

Hi, top-posting for once, to make this easily accessible to everyone.

Stéphane, many thx for your insights.

I'm CCing Linus on this, as I guess he wants to be aware of this.

Normally I try to sum up things somewhat when doing so, but this here
likely won't end well. So all I'm saying is: commit 157a3537d6bc28
("apparmor: Fix regression in mount mediation") [v6.7-rc1] that was also
included in v6.6.3 broke containers in some setups. John described that
commit with the words "it is a far from good solution. It is a stop
gap." and later mentioned that "a CVE is coming for this and that having
this unmediated in the tree isn't good either."

For more details please check out the thread that on lore started with
this message:

https://lore.kernel.org/all/CA+enf=u0UmgjKrd98EYkxFu7FYV8dR1SBYJn_1b0Naq=3twbbQ@mail.gmail.com/

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

On 05.12.23 07:57, Stéphane Graber wrote:
> On Mon, Dec 4, 2023 at 9:20 AM Linux regression tracking (Thorsten
> Leemhuis) <regressions@leemhuis.info> wrote:
>>
>> On 04.12.23 14:14, John Johansen wrote:
>>> On 12/3/23 17:34, Stéphane Graber wrote:
>>
>> Side note: Stéphane, thx for CCing the regressions list.
>>
>>>> On Sun, Dec 3, 2023 at 8:21 PM John Johansen
>>>> <john.johansen@canonical.com> wrote:
>>>>> On 12/2/23 17:20, Stéphane Graber wrote:
>>>>>>
>>>>>> Upstream commit 157a3537d6bc28ceb9a11fc8cb67f2152d860146 which just
>>>>>> landed in 6.6.3 stable as 96af45154a0be30485ad07f70f852b1456cb13d7 is
>>>>>> blocking new mounts for all LXC, LXD and Incus users (at least) on
>>>>>> distributions using the newer version of util-linux.
>>>>>>
>>>>>> That's because for a simple mount like "mount -t tmpfs tmpfs /tmp",
>>>>>> the new mount command now performs:
>>>>>> ```
>>>>>> fsconfig(3, FSCONFIG_SET_STRING, "source", "tmpfs", 0) = 0
>>>>>> fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = 0
>>>>>> fsmount(3, FSMOUNT_CLOEXEC, 0)          = 4
>>>>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_MNT_ID,
>>>>>> {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID,
>>>>>> stx_attributes=STATX_ATTR_MOUNT_ROOT, stx_mode=S_IFDIR|S_ISVTX|0777,
>>>>>> stx_size=40, ...}) = 0
>>>>>> move_mount(4, "", AT_FDCWD, "/tmp", MOVE_MOUNT_F_EMPTY_PATH) = 0
>>>>>> ```
>>>>>>
>>>>>> That last call to "move_mount" is incorrectly interpreted by AppArmor
>>>>>> as an attempt to move-mount "/" to "/mnt" rather than as a new mount
>>>>>> being created, this therefore results in:
>>>>>> ```
>>>>>> Dec 03 01:05:03 kernel-test kernel: audit: type=1400
>>>>>> audit(1701565503.599:34): apparmor="DENIED" operation="mount"
>>>>>> class="mount" info="failed perms check" error=-13
>>>>>> profile="incus-a_</var/lib/incus>" name="/tmp/" pid=2190 comm="mount"
>>>>>> srcname="/" flags="rw, move"
>>>>>> ```
>>>>>>
>>>>>> Note that the flags here show "move", the fstype isn't even set and
>>>>>> the source path at srcname incorrectly shows "/".
>>>>>>
>>>>>> This operation therefore trips any container manager which has an
>>>>>> apparmor security policy preventing arbitrary move-mount (as those
>>>>>> could be used to bypass other apparmor path based policies).
>>>>>>
>>>>>>
>>>>>> The way I see it, the current mediation support effectively breaks any
>>>>>> attempt at mediating mounts in a useful way in apparmor as it's now
>>>>>> impossible to mediate new mounts based on their fstype or even
>>>>>> distinguish them from a move-mount operation.
>>>>>>
>>>>> Indeed it is a far from good solution. It is a stop gap.
>>>>>>
>>>>>> I don't know if this warrants pulling the mediation patch out of
>>>>>> stable (and out of linus' tree), obviously doing that would
>>>>>> reintroduce that hole in mount coverage, but at the same time, the
>>>>>> current coverage is broken enough that our only alternative is to
>>>>>> effectively allow all mounts, making the current mediation useless.
>>>>>
>>>>> pulling it effectively means ALL applications by-pass mediation, the
>>>>> alternative is to block all applications from using the move_mount
>>>>> system call as part of mediation. Which might have been acceptable
>>>>> as a stop gap when the syscall first landed but not now.
>>>>
>>>> Pulling it from the stable branch may still make sense, you now have
>>>> folks who are updating to get actual bugfixes and end up with broken
>>>> containers, that doesn't exactly seem like a good outcome...
>>>
>>> I will defer such a decision to the maintainers the stable trees. I
>>> can see arguments either way.
>>
>> FWIW, Greg usually does not revert a backported change if the change
>> causes the same problem to happen with mainline, as then it should be
>> fixed there as well. Which is normally the right thing to do, as Linus
>> wants people to be able to upgrade to new kernels without having them to
>> upgrade anything else (firmware, anything in userland incl. apparmor
>> policies).
>>
>> So normally this whole issue afaics would be "157a3537d6bc28 needs to be
>> fixed in mainline (if nothing else helps with a revert), and that fix
>> then needs to be backported to the various stable trees".
>>
>> It obviously is more complicated because that commit apparently fixes a
>> security vulnerability. But even under such circumstances Linus afaik
>> wants us to do everything possible to avoid breaking peoples workflows.
>> Which is why I wonder if there is a more elegant solution here
>> somewhere. I doubt that the answer is yes, but I'll ask the following
>> anyway: Would a kernel config option that distros could set when they
>> updated their Apparmor policies help here? Or something in sysfs?
>>
>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> 
> So there are a few different scenarios here:
> 
> 1) Distribution shipping application specific profiles that were
> developed back when the new VFS mount API wasn't around, those
> profiles allow the specific mount calls made by the application. The
> application has since moved on to using the VFS mount API and thanks
> to the lack of apparmor coverage, things have kept on working fine.
> Now when the distro pulls this bugfix, the application will now fail
> to perform the mount. Apparmor returns EACCESS which doesn't cause a
> fallback to the old mount API but instead a straight up error returned
> to the user. ENOSYS would usually be handled properly.
> 
> 2) All the container managers out there which use AppArmor as a safety
> net to block mostly misbehaving or misconfigured software. Those
> policies are not meant to be water tight to begin with (and due to the
> nature of containers, they really can't be). But they do catch a lot
> and while not an actual security barrier, they certainly prevent a lot
> of accidents. Unfortunately with the new mediation in place, the only
> real way to operate moving forward is to not mediate mounts at all as
> mediation of the new VFS API leads to just about everything getting
> denied and apparmor doesn't provide a way to separately mediate the
> old and new mount APIs.
> 
> 3) A system where apparmor is used to effectively prevent any mounting
> whatsoever, where the policy is now being bypassed by simply using the
> new VFS API. Such a system would also need to not be using Seccomp in
> combination with Apparmor as blocking the new VFS API syscalls would
> avoid the issue.
> 
> 
> I suspect 3) is what John is hinting at in this thread. I could
> certainly see this bug be used to bypass the snapd mount restrictions
> for example, though snapd also generates seccomp policies, so just
> blocking the new VFS API would be a much simpler fix there.
> 
> I have no idea how common 1) is, but it looks like we're about to find
> out soon and that has potential for some very "interesting" bug
> reports as mount related errors are usually not the easiest to
> understand and LSMs make them so much worse.
> 
> As for 2), as it stands, we're going to have to effectively turn off
> any of the mount related safety nets we had in place in LXC, Incus and
> likely most other container runtimes out there to handle this. The
> usual issue with doing that kind of heavy hammer change is that once
> it's done, it will take a long time to undo it. That is, once AppArmor
> is actually capable of mediating the way it's supposed to, a lot of
> the profiles will have been changed to just blanket allow all mounts
> and they're only ever going to be changed once we're confident that
> the vast majority of our users are running a kernel with the fixed
> logic.
> 
> I guess some of that could be handled better if there was a way to
> detect the current broken mediation of the new mount API and then
> later again, detect that the kernel now has working mediation,
> allowing for those container managers to generate accurate profile
> rather than have to go for the "safest" option (as far as keeping
> users running) and just keep allowing everything.
> 
> John, is there any such detection mechanism currently present that
> could be used by userspace to better handle this situation?
> 
> 
> Overall, this still feels very rushed and broken to me. I understand
> that being able to trivially bypass AppArmor mount rules isn't great,
> but shipping code that makes the vast majority of said rules useless
> doesn't really feel like such an improvement. It's effectively turning
> what was a reasonably flexible policy engine around mount calls, into
> a binary switch to either allow everything or block everything.
> 
> We at the very least need a way to check whether we're dealing with
> this known broken state so that any change that's made to userspace
> can be made condition on this, ensuring that once mediation actually
> works as intended, those policies also go back to the way they're
> supposed to be.
> 
> Stéphane
> 
>> --
>> Everything you wanna know about Linux kernel regression tracking:
>> https://linux-regtracking.leemhuis.info/about/#tldr
>> If I did something stupid, please tell me, as explained on that page.
>>
>>> But I will note that even if we had been able perfectly match the
>>> new interface mediation to the old breakage is still possible because
>>> containers could have been using new interface to do things not
>>> covered by the old interface rules.
>>>
>>> Not that it matters to the upstream kernel maintainers but a CVE
>>> is coming for this and that having this unmediated in the tree isn't
>>> good either.
>>>
>>>>> The situation can be broken down like this
>>>>> unconfined applications have no restrictions, this doesn't affect
>>>>> hem.
>>>>>
>>>>> confined applications using the old mount interface continue to
>>>>> work as expected.
>>>>>
>>>>> confined applications using the new mount interface now are blocked
>>>>> unless there is, admittedly, a very broad mount rule granted. This
>>>>> is however better than the old situation in which confined
>>>>> applications where not mediated at all.
>>>>
>>>> It depends from your point of view. If you are using apparmor around a
>>>> single application and have tailored rules for exactly what it is
>>>> doing, then yes, I agree.
>>>> But for all cases where apparmor is used as a safety net to block
>>>> potentially dangerous/problematic actions, this is causing the kind of
>>>> breakage which will cause people to just plain turn off apparmor.
>>>>
>>>
>>> Only in the sense that closing a security regression causes breakage.
>>>
>>> Lets cover the cases.
>>> Policy is not updated, and application uses the old mount interface:
>>> this continues to work.
>>>
>>> *Policy is not updated, and application uses the new mount interface:
>>> the access is likely blocked (the security hole has been closed)
>>> unless the policy already allowed a generic
>>>   allow mount,
>>>
>>> +Policy is updated with generic
>>>   allow mount,
>>> and application uses the old mount interface:
>>> this continues to work, security restrictions are reduced on the
>>> old interface, but policy reflects what can actually be enforced
>>> around mount, because if the application is compromised and uses
>>> the new mount interface this is what can be mediated.
>>>
>>> Policy is updated with generic
>>>   allow mount,
>>> and application uses the new mount interface:
>>> The application continues to work.
>>>
>>> * The breakage case is where the application was using the new
>>> interface and not being mediated. While being able to mediate
>>> the new interface to the same level of the old would help, it
>>> does not guarantee this case won't break, as the application
>>> could be using to do things that are not restricted by current
>>> policy.
>>>
>>> + The regressed mediation case, is only sort of reduced mediation.
>>> The policy can maintain all existing mount rules so that info
>>> is not lost going forward.
>>> The mediation on the old mount interface is reduced by the new
>>> mount rule, but at the same time it now possible to turn of
>>> the new mount interface, if mount is not being used by the
>>> application.
>>>
>>>>> The regression is in the policy, where an application like LXD/Incus
>>>>> are specify mount rules and loading policy based on it. Those
>>>>> restrictions continue to work on the old mount interface, they
>>>>> however do not work as one would expect when the new mount interface
>>>>> is used. To allow applications to use the new mount interface
>>>>> a broad mount rule is needed, which basically makes the other
>>>>> mount restrictions useless.
>>>>>
>>>>> This situation is however still better than without the patch
>>>>> because, applications trying to use the new mount interface
>>>>> were not restricted at all. The only regression is in mediation
>>>>> of applications that are using the old mount interface. But
>>>>> again those application, without the patch, can just use the
>>>>> new interface and by-pass the restrictions from those rules. In
>>>>> this case the broad rule is not good, but better than the by-pass.
>>>>>
>>>>> I am working on a patch that will improve the situation, over the
>>>>> current patch using the existing move_mount hook. I wanted to
>>>>> be able to get more testing in place, before we did more, and
>>>>> just didn't have the time.
>>>>
>>>> So just to understand, this change went upstream while it was known
>>>> that effectively no sane mount policy would work with it?
>>>>
>>>
>>> having a toggle to turn off mount is sane. It actually is a little
>>> better than that in that you can do more with some cases. It still
>>> allows full mediation under the old interface, it just can't match
>>> how fine the mediation is with the old interface.
>>>
>>>> That seems very odd, at that point, why not instead upstream a patch
>>>> that disables the new mount API when apparmor is in use with mount
>>>> rules?
>>>
>>> It breaks applications, some will fallback, some don't. Here you
>>> get a policy controllable toggle. Admittedly it returns -EACCES
>>> instead of a different error code but that could be adjusted.
>>>
>>>> That feels like it would have had the same theoretical benefits
>>>> without the breakage that this change caused.
>>>>
>>> I am willing to experiment with this more but it can break
>>> applications.
>>>
>>>>> But the reality is, apparmor won't be able to achieve equivalent
>>>>> mediation without new hooks or at least more mount context passed
>>>>> into the existing hook. That to is being looked at but won't be
>>>>> is even further out.
>>>>
>>>> That doesn't sound encouraging as far as getting a suitable fix to our
>>>> users...
>>>>
>>> If by suitable fix you mean equivalence of mediation between the
>>> old and new interface, not in the short term.
>>>
>>>>
>>>> Is there even a way in apparmor policies to cleanly prevent the use of
>>>> the new mount API (return ENOSYS back to userspace)?
>>>
>>> Its possible, it will take another patch or two kernel side for this
>>> case. We could even add a boolean and make it so you can apply mount
>>> rules specifically only to the old or new interface.
>>>
>>>> We can obviously do this through a separate seccomp filter, but not
>>>> everyone is using seccomp...
>>>>
>>>> Stéphane
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Apparmor move_mount mediation breaks mount tool in containers
  2023-12-04  1:34   ` Apparmor move_mount mediation breaks mount tool in containers Stéphane Graber
  2023-12-04 13:14     ` John Johansen
@ 2023-12-05 12:24     ` Linux regression tracking #adding (Thorsten Leemhuis)
  2023-12-23  8:17       ` Linux regression tracking #update (Thorsten Leemhuis)
  1 sibling, 1 reply; 17+ messages in thread
From: Linux regression tracking #adding (Thorsten Leemhuis) @ 2023-12-05 12:24 UTC (permalink / raw)
  To: regressions

[TLDR: This mail in primarily relevant for Linux kernel regression
tracking. See link in footer if these mails annoy you.]

Below issue is already discussed, but to ensure it does not fall through
the cracks let me add this to the tracking:

#regzbot ^introduced 157a3537d6bc28ceb9a11fc8cb67f2152d
#regzbot title apparmor: move_mount mediation breaks mount tool in
containers
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

On 04.12.23 02:34, Stéphane Graber wrote:
> On Sun, Dec 3, 2023 at 8:21 PM John Johansen
> <john.johansen@canonical.com> wrote:
>>
>> On 12/2/23 17:20, Stéphane Graber wrote:
>>> Hey John,
>>>
>>> Upstream commit 157a3537d6bc28ceb9a11fc8cb67f2152d860146 which just
>>> landed in 6.6.3 stable as 96af45154a0be30485ad07f70f852b1456cb13d7 is
>>> blocking new mounts for all LXC, LXD and Incus users (at least) on
>>> distributions using the newer version of util-linux.
>>>
>>> That's because for a simple mount like "mount -t tmpfs tmpfs /tmp",
>>> the new mount command now performs:
>>> ```
>>> fsconfig(3, FSCONFIG_SET_STRING, "source", "tmpfs", 0) = 0
>>> fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = 0
>>> fsmount(3, FSMOUNT_CLOEXEC, 0)          = 4
>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_MNT_ID,
>>> {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID,
>>> stx_attributes=STATX_ATTR_MOUNT_ROOT, stx_mode=S_IFDIR|S_ISVTX|0777,
>>> stx_size=40, ...}) = 0
>>> move_mount(4, "", AT_FDCWD, "/tmp", MOVE_MOUNT_F_EMPTY_PATH) = 0
>>> ```
>>>
>>> That last call to "move_mount" is incorrectly interpreted by AppArmor
>>> as an attempt to move-mount "/" to "/mnt" rather than as a new mount
>>> being created, this therefore results in:
>>> ```
>>> Dec 03 01:05:03 kernel-test kernel: audit: type=1400
>>> audit(1701565503.599:34): apparmor="DENIED" operation="mount"
>>> class="mount" info="failed perms check" error=-13
>>> profile="incus-a_</var/lib/incus>" name="/tmp/" pid=2190 comm="mount"
>>> srcname="/" flags="rw, move"
>>> ```
>>>
>>> Note that the flags here show "move", the fstype isn't even set and
>>> the source path at srcname incorrectly shows "/".
>>>
>>> This operation therefore trips any container manager which has an
>>> apparmor security policy preventing arbitrary move-mount (as those
>>> could be used to bypass other apparmor path based policies).
>>>
>>>
>>> The way I see it, the current mediation support effectively breaks any
>>> attempt at mediating mounts in a useful way in apparmor as it's now
>>> impossible to mediate new mounts based on their fstype or even
>>> distinguish them from a move-mount operation.
>>>
>> Indeed it is a far from good solution. It is a stop gap.
>>>
>>> I don't know if this warrants pulling the mediation patch out of
>>> stable (and out of linus' tree), obviously doing that would
>>> reintroduce that hole in mount coverage, but at the same time, the
>>> current coverage is broken enough that our only alternative is to
>>> effectively allow all mounts, making the current mediation useless.
>>
>> pulling it effectively means ALL applications by-pass mediation, the
>> alternative is to block all applications from using the move_mount
>> system call as part of mediation. Which might have been acceptable
>> as a stop gap when the syscall first landed but not now.
> 
> Pulling it from the stable branch may still make sense, you now have
> folks who are updating to get actual bugfixes and end up with broken
> containers, that doesn't exactly seem like a good outcome...
> 
>> The situation can be broken down like this
>> unconfined applications have no restrictions, this doesn't affect
>> hem.
>>
>> confined applications using the old mount interface continue to
>> work as expected.
>>
>> confined applications using the new mount interface now are blocked
>> unless there is, admittedly, a very broad mount rule granted. This
>> is however better than the old situation in which confined
>> applications where not mediated at all.
> 
> It depends from your point of view. If you are using apparmor around a
> single application and have tailored rules for exactly what it is
> doing, then yes, I agree.
> But for all cases where apparmor is used as a safety net to block
> potentially dangerous/problematic actions, this is causing the kind of
> breakage which will cause people to just plain turn off apparmor.
> 
>> The regression is in the policy, where an application like LXD/Incus
>> are specify mount rules and loading policy based on it. Those
>> restrictions continue to work on the old mount interface, they
>> however do not work as one would expect when the new mount interface
>> is used. To allow applications to use the new mount interface
>> a broad mount rule is needed, which basically makes the other
>> mount restrictions useless.
>>
>> This situation is however still better than without the patch
>> because, applications trying to use the new mount interface
>> were not restricted at all. The only regression is in mediation
>> of applications that are using the old mount interface. But
>> again those application, without the patch, can just use the
>> new interface and by-pass the restrictions from those rules. In
>> this case the broad rule is not good, but better than the by-pass.
>>
>> I am working on a patch that will improve the situation, over the
>> current patch using the existing move_mount hook. I wanted to
>> be able to get more testing in place, before we did more, and
>> just didn't have the time.
> 
> So just to understand, this change went upstream while it was known
> that effectively no sane mount policy would work with it?
> 
> That seems very odd, at that point, why not instead upstream a patch
> that disables the new mount API when apparmor is in use with mount
> rules?
> That feels like it would have had the same theoretical benefits
> without the breakage that this change caused.
> 
>> But the reality is, apparmor won't be able to achieve equivalent
>> mediation without new hooks or at least more mount context passed
>> into the existing hook. That to is being looked at but won't be
>> is even further out.
> 
> That doesn't sound encouraging as far as getting a suitable fix to our users...
> 
> 
> Is there even a way in apparmor policies to cleanly prevent the use of
> the new mount API (return ENOSYS back to userspace)?
> We can obviously do this through a separate seccomp filter, but not
> everyone is using seccomp...
> 
> Stéphane
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Apparmor move_mount mediation breaks mount tool in containers
  2023-12-05  8:45           ` Linux regression tracking (Thorsten Leemhuis)
@ 2023-12-05 17:08             ` Christian Brauner
  2023-12-05 18:34               ` John Johansen
  0 siblings, 1 reply; 17+ messages in thread
From: Christian Brauner @ 2023-12-05 17:08 UTC (permalink / raw)
  To: Linux regressions mailing list
  Cc: Stéphane Graber, John Johansen, Sasha Levin, Aleksa Sarai,
	Alexander Mihalicyn, paul, James Morris, Serge Hallyn,
	linux-security-module, Linus Torvalds

On Tue, Dec 05, 2023 at 09:45:35AM +0100, Linux regression tracking (Thorsten Leemhuis) wrote:
> [CCing Linus]
> 
> Hi, top-posting for once, to make this easily accessible to everyone.
> 
> Stéphane, many thx for your insights.
> 
> I'm CCing Linus on this, as I guess he wants to be aware of this.
> 
> Normally I try to sum up things somewhat when doing so, but this here
> likely won't end well. So all I'm saying is: commit 157a3537d6bc28
> ("apparmor: Fix regression in mount mediation") [v6.7-rc1] that was also
> included in v6.6.3 broke containers in some setups. John described that
> commit with the words "it is a far from good solution. It is a stop
> gap." and later mentioned that "a CVE is coming for this and that having
> this unmediated in the tree isn't good either."
> 
> For more details please check out the thread that on lore started with
> this message:
> 
> https://lore.kernel.org/all/CA+enf=u0UmgjKrd98EYkxFu7FYV8dR1SBYJn_1b0Naq=3twbbQ@mail.gmail.com/
> 
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
> 
> On 05.12.23 07:57, Stéphane Graber wrote:
> > On Mon, Dec 4, 2023 at 9:20 AM Linux regression tracking (Thorsten
> > Leemhuis) <regressions@leemhuis.info> wrote:
> >>
> >> On 04.12.23 14:14, John Johansen wrote:
> >>> On 12/3/23 17:34, Stéphane Graber wrote:
> >>
> >> Side note: Stéphane, thx for CCing the regressions list.
> >>
> >>>> On Sun, Dec 3, 2023 at 8:21 PM John Johansen
> >>>> <john.johansen@canonical.com> wrote:
> >>>>> On 12/2/23 17:20, Stéphane Graber wrote:
> >>>>>>
> >>>>>> Upstream commit 157a3537d6bc28ceb9a11fc8cb67f2152d860146 which just
> >>>>>> landed in 6.6.3 stable as 96af45154a0be30485ad07f70f852b1456cb13d7 is
> >>>>>> blocking new mounts for all LXC, LXD and Incus users (at least) on
> >>>>>> distributions using the newer version of util-linux.
> >>>>>>
> >>>>>> That's because for a simple mount like "mount -t tmpfs tmpfs /tmp",
> >>>>>> the new mount command now performs:
> >>>>>> ```
> >>>>>> fsconfig(3, FSCONFIG_SET_STRING, "source", "tmpfs", 0) = 0
> >>>>>> fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = 0
> >>>>>> fsmount(3, FSMOUNT_CLOEXEC, 0)          = 4
> >>>>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_MNT_ID,
> >>>>>> {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID,
> >>>>>> stx_attributes=STATX_ATTR_MOUNT_ROOT, stx_mode=S_IFDIR|S_ISVTX|0777,
> >>>>>> stx_size=40, ...}) = 0
> >>>>>> move_mount(4, "", AT_FDCWD, "/tmp", MOVE_MOUNT_F_EMPTY_PATH) = 0
> >>>>>> ```
> >>>>>>
> >>>>>> That last call to "move_mount" is incorrectly interpreted by AppArmor
> >>>>>> as an attempt to move-mount "/" to "/mnt" rather than as a new mount
> >>>>>> being created, this therefore results in:
> >>>>>> ```
> >>>>>> Dec 03 01:05:03 kernel-test kernel: audit: type=1400
> >>>>>> audit(1701565503.599:34): apparmor="DENIED" operation="mount"
> >>>>>> class="mount" info="failed perms check" error=-13
> >>>>>> profile="incus-a_</var/lib/incus>" name="/tmp/" pid=2190 comm="mount"
> >>>>>> srcname="/" flags="rw, move"
> >>>>>> ```

move_mount() doesn't create a new mount it just moves an already
existing mount around in the tree. Either that mount is moved
(fsmount(), open_tree(OPEN_TREE_CLONE)) from it's source location or it
was a detached mount to begin with.

The new mount api splits mounting across multiple system calls. And the
state it keeps for itself is gone once you create a mount via fsmount().
Because then the fs_context will be destroyed and you're only dealing
with a superblock and mount(s) for it.

You shouldn't care about move_mount() moving a detached mount because
really it is not a move from some other filesystem location. The
creation of that mount will have already happened in
open_tree(OPEN_TREE_CLONE) or in fsmount(). So that ship has sailed and
move_mount() is the wrong place to do this.

If the only thing that would be broken here is that you want to block
moving mounts from a given source location then you need to pass down
that this is an actual move to the LSM hook. Because I'm really not keen
on giving LSMs access to is_anon_ns() or similar helpers. They're just
too easy to misuse.

> > Overall, this still feels very rushed and broken to me. I understand
> > that being able to trivially bypass AppArmor mount rules isn't great,
> > but shipping code that makes the vast majority of said rules useless
> > doesn't really feel like such an improvement. It's effectively turning
> > what was a reasonably flexible policy engine around mount calls, into
> > a binary switch to either allow everything or block everything.

For a 1:1 mapping of AppArmor LSM rules from the old to the new mount
api I expect that one will have to effectively redo everything
internally and keep state across multiple system calls. IMO, this is
really not the way to go. Instead it's probably wise to have a new mount
mediation policy for the new mount api with different semantics. Because
I have my doubts that a 1:1 mapping will even work and won't just cause
you CVEs.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Apparmor move_mount mediation breaks mount tool in containers
  2023-12-05 17:08             ` Christian Brauner
@ 2023-12-05 18:34               ` John Johansen
  2023-12-06 14:12                 ` Christian Brauner
  0 siblings, 1 reply; 17+ messages in thread
From: John Johansen @ 2023-12-05 18:34 UTC (permalink / raw)
  To: Christian Brauner, Linux regressions mailing list
  Cc: Stéphane Graber, Sasha Levin, Aleksa Sarai,
	Alexander Mihalicyn, paul, James Morris, Serge Hallyn,
	linux-security-module, Linus Torvalds

On 12/5/23 09:08, Christian Brauner wrote:
> On Tue, Dec 05, 2023 at 09:45:35AM +0100, Linux regression tracking (Thorsten Leemhuis) wrote:
>> [CCing Linus]
>>
>> Hi, top-posting for once, to make this easily accessible to everyone.
>>
>> Stéphane, many thx for your insights.
>>
>> I'm CCing Linus on this, as I guess he wants to be aware of this.
>>
>> Normally I try to sum up things somewhat when doing so, but this here
>> likely won't end well. So all I'm saying is: commit 157a3537d6bc28
>> ("apparmor: Fix regression in mount mediation") [v6.7-rc1] that was also
>> included in v6.6.3 broke containers in some setups. John described that
>> commit with the words "it is a far from good solution. It is a stop
>> gap." and later mentioned that "a CVE is coming for this and that having
>> this unmediated in the tree isn't good either."
>>
>> For more details please check out the thread that on lore started with
>> this message:
>>
>> https://lore.kernel.org/all/CA+enf=u0UmgjKrd98EYkxFu7FYV8dR1SBYJn_1b0Naq=3twbbQ@mail.gmail.com/
>>
>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>> --
>> Everything you wanna know about Linux kernel regression tracking:
>> https://linux-regtracking.leemhuis.info/about/#tldr
>> If I did something stupid, please tell me, as explained on that page.
>>
>> On 05.12.23 07:57, Stéphane Graber wrote:
>>> On Mon, Dec 4, 2023 at 9:20 AM Linux regression tracking (Thorsten
>>> Leemhuis) <regressions@leemhuis.info> wrote:
>>>>
>>>> On 04.12.23 14:14, John Johansen wrote:
>>>>> On 12/3/23 17:34, Stéphane Graber wrote:
>>>>
>>>> Side note: Stéphane, thx for CCing the regressions list.
>>>>
>>>>>> On Sun, Dec 3, 2023 at 8:21 PM John Johansen
>>>>>> <john.johansen@canonical.com> wrote:
>>>>>>> On 12/2/23 17:20, Stéphane Graber wrote:
>>>>>>>>
>>>>>>>> Upstream commit 157a3537d6bc28ceb9a11fc8cb67f2152d860146 which just
>>>>>>>> landed in 6.6.3 stable as 96af45154a0be30485ad07f70f852b1456cb13d7 is
>>>>>>>> blocking new mounts for all LXC, LXD and Incus users (at least) on
>>>>>>>> distributions using the newer version of util-linux.
>>>>>>>>
>>>>>>>> That's because for a simple mount like "mount -t tmpfs tmpfs /tmp",
>>>>>>>> the new mount command now performs:
>>>>>>>> ```
>>>>>>>> fsconfig(3, FSCONFIG_SET_STRING, "source", "tmpfs", 0) = 0
>>>>>>>> fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = 0
>>>>>>>> fsmount(3, FSMOUNT_CLOEXEC, 0)          = 4
>>>>>>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_MNT_ID,
>>>>>>>> {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID,
>>>>>>>> stx_attributes=STATX_ATTR_MOUNT_ROOT, stx_mode=S_IFDIR|S_ISVTX|0777,
>>>>>>>> stx_size=40, ...}) = 0
>>>>>>>> move_mount(4, "", AT_FDCWD, "/tmp", MOVE_MOUNT_F_EMPTY_PATH) = 0
>>>>>>>> ```
>>>>>>>>
>>>>>>>> That last call to "move_mount" is incorrectly interpreted by AppArmor
>>>>>>>> as an attempt to move-mount "/" to "/mnt" rather than as a new mount
>>>>>>>> being created, this therefore results in:
>>>>>>>> ```
>>>>>>>> Dec 03 01:05:03 kernel-test kernel: audit: type=1400
>>>>>>>> audit(1701565503.599:34): apparmor="DENIED" operation="mount"
>>>>>>>> class="mount" info="failed perms check" error=-13
>>>>>>>> profile="incus-a_</var/lib/incus>" name="/tmp/" pid=2190 comm="mount"
>>>>>>>> srcname="/" flags="rw, move"
>>>>>>>> ```
> 
> move_mount() doesn't create a new mount it just moves an already
> existing mount around in the tree. Either that mount is moved
> (fsmount(), open_tree(OPEN_TREE_CLONE)) from it's source location or it
> was a detached mount to begin with.
> 
yes, all to aware of this

> The new mount api splits mounting across multiple system calls. And the
> state it keeps for itself is gone once you create a mount via fsmount().

yep

> Because then the fs_context will be destroyed and you're only dealing
> with a superblock and mount(s) for it.
> 
yep

> You shouldn't care about move_mount() moving a detached mount because
> really it is not a move from some other filesystem location. The
> creation of that mount will have already happened in
> open_tree(OPEN_TREE_CLONE) or in fsmount(). So that ship has sailed and
> move_mount() is the wrong place to do this.
> 
yes and no. What we are specifically dong is mediating existing attached
mounts as always and blocking move_mount from attaching the detached
mount via move_mount. Which is very much something move_mount mediation
should be able to do.

It works as a stop gap for the new mount api bypassing the LSM at a very
course level. This is why it requires a very general mount rule for
apparmor to allow move_mount of the detached mount. A conditional that
improves control around this is coming, but we can't make move_mount()
mediation provide the full set of apparmor old mount mediation.

Providing something close to what we were doing before is going to
require new hooks, and likely specialized states tailored to the
new mount api. Until that does happen it does mean a reduction in
what apparmor policy can mediate. Today instead of being able to
specify details about the mount you need a generic mount rule that
just allows the application to do pretty much anything with mounts,
soon you will be able to have an addition conditional that allows
better control of move mount, and better logging of what is going on.

> If the only thing that would be broken here is that you want to block
> moving mounts from a given source location then you need to pass down
> that this is an actual move to the LSM hook. Because I'm really not keen
> on giving LSMs access to is_anon_ns() or similar helpers. They're just
> too easy to misuse.
> 

indeed the LSM hook here needs to have some more info passed in.

>>> Overall, this still feels very rushed and broken to me. I understand
>>> that being able to trivially bypass AppArmor mount rules isn't great,
>>> but shipping code that makes the vast majority of said rules useless
>>> doesn't really feel like such an improvement. It's effectively turning
>>> what was a reasonably flexible policy engine around mount calls, into
>>> a binary switch to either allow everything or block everything.
> 
> For a 1:1 mapping of AppArmor LSM rules from the old to the new mount
> api I expect that one will have to effectively redo everything
> internally and keep state across multiple system calls. IMO, this is
> really not the way to go. Instead it's probably wise to have a new mount
> mediation policy for the new mount api with different semantics. Because
> I have my doubts that a 1:1 mapping will even work and won't just cause
> you CVEs.

yes the mount policy will have to be extended to support the new api.
Getting that right will take time.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Apparmor move_mount mediation breaks mount tool in containers
  2023-12-05  6:57         ` Stéphane Graber
  2023-12-05  8:45           ` Linux regression tracking (Thorsten Leemhuis)
@ 2023-12-05 19:55           ` John Johansen
  2023-12-06  2:18             ` Stéphane Graber
  1 sibling, 1 reply; 17+ messages in thread
From: John Johansen @ 2023-12-05 19:55 UTC (permalink / raw)
  To: Stéphane Graber, Linux regressions mailing list
  Cc: Christian Brauner, Sasha Levin, Aleksa Sarai,
	Alexander Mihalicyn, paul, James Morris, Serge Hallyn,
	linux-security-module

On 12/4/23 22:57, Stéphane Graber wrote:
> On Mon, Dec 4, 2023 at 9:20 AM Linux regression tracking (Thorsten
> Leemhuis) <regressions@leemhuis.info> wrote:
>>
>> On 04.12.23 14:14, John Johansen wrote:
>>> On 12/3/23 17:34, Stéphane Graber wrote:
>>
>> Side note: Stéphane, thx for CCing the regressions list.
>>
>>>> On Sun, Dec 3, 2023 at 8:21 PM John Johansen
>>>> <john.johansen@canonical.com> wrote:
>>>>> On 12/2/23 17:20, Stéphane Graber wrote:
>>>>>>
>>>>>> Upstream commit 157a3537d6bc28ceb9a11fc8cb67f2152d860146 which just
>>>>>> landed in 6.6.3 stable as 96af45154a0be30485ad07f70f852b1456cb13d7 is
>>>>>> blocking new mounts for all LXC, LXD and Incus users (at least) on
>>>>>> distributions using the newer version of util-linux.
>>>>>>
>>>>>> That's because for a simple mount like "mount -t tmpfs tmpfs /tmp",
>>>>>> the new mount command now performs:
>>>>>> ```
>>>>>> fsconfig(3, FSCONFIG_SET_STRING, "source", "tmpfs", 0) = 0
>>>>>> fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = 0
>>>>>> fsmount(3, FSMOUNT_CLOEXEC, 0)          = 4
>>>>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_MNT_ID,
>>>>>> {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID,
>>>>>> stx_attributes=STATX_ATTR_MOUNT_ROOT, stx_mode=S_IFDIR|S_ISVTX|0777,
>>>>>> stx_size=40, ...}) = 0
>>>>>> move_mount(4, "", AT_FDCWD, "/tmp", MOVE_MOUNT_F_EMPTY_PATH) = 0
>>>>>> ```
>>>>>>
>>>>>> That last call to "move_mount" is incorrectly interpreted by AppArmor
>>>>>> as an attempt to move-mount "/" to "/mnt" rather than as a new mount
>>>>>> being created, this therefore results in:
>>>>>> ```
>>>>>> Dec 03 01:05:03 kernel-test kernel: audit: type=1400
>>>>>> audit(1701565503.599:34): apparmor="DENIED" operation="mount"
>>>>>> class="mount" info="failed perms check" error=-13
>>>>>> profile="incus-a_</var/lib/incus>" name="/tmp/" pid=2190 comm="mount"
>>>>>> srcname="/" flags="rw, move"
>>>>>> ```
>>>>>>
>>>>>> Note that the flags here show "move", the fstype isn't even set and
>>>>>> the source path at srcname incorrectly shows "/".
>>>>>>
>>>>>> This operation therefore trips any container manager which has an
>>>>>> apparmor security policy preventing arbitrary move-mount (as those
>>>>>> could be used to bypass other apparmor path based policies).
>>>>>>
>>>>>>
>>>>>> The way I see it, the current mediation support effectively breaks any
>>>>>> attempt at mediating mounts in a useful way in apparmor as it's now
>>>>>> impossible to mediate new mounts based on their fstype or even
>>>>>> distinguish them from a move-mount operation.
>>>>>>
>>>>> Indeed it is a far from good solution. It is a stop gap.
>>>>>>
>>>>>> I don't know if this warrants pulling the mediation patch out of
>>>>>> stable (and out of linus' tree), obviously doing that would
>>>>>> reintroduce that hole in mount coverage, but at the same time, the
>>>>>> current coverage is broken enough that our only alternative is to
>>>>>> effectively allow all mounts, making the current mediation useless.
>>>>>
>>>>> pulling it effectively means ALL applications by-pass mediation, the
>>>>> alternative is to block all applications from using the move_mount
>>>>> system call as part of mediation. Which might have been acceptable
>>>>> as a stop gap when the syscall first landed but not now.
>>>>
>>>> Pulling it from the stable branch may still make sense, you now have
>>>> folks who are updating to get actual bugfixes and end up with broken
>>>> containers, that doesn't exactly seem like a good outcome...
>>>
>>> I will defer such a decision to the maintainers the stable trees. I
>>> can see arguments either way.
>>
>> FWIW, Greg usually does not revert a backported change if the change
>> causes the same problem to happen with mainline, as then it should be
>> fixed there as well. Which is normally the right thing to do, as Linus
>> wants people to be able to upgrade to new kernels without having them to
>> upgrade anything else (firmware, anything in userland incl. apparmor
>> policies).
>>
>> So normally this whole issue afaics would be "157a3537d6bc28 needs to be
>> fixed in mainline (if nothing else helps with a revert), and that fix
>> then needs to be backported to the various stable trees".
>>
>> It obviously is more complicated because that commit apparently fixes a
>> security vulnerability. But even under such circumstances Linus afaik
>> wants us to do everything possible to avoid breaking peoples workflows.
>> Which is why I wonder if there is a more elegant solution here
>> somewhere. I doubt that the answer is yes, but I'll ask the following
>> anyway: Would a kernel config option that distros could set when they
>> updated their Apparmor policies help here? Or something in sysfs?
>>
>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> 
> So there are a few different scenarios here:
> 
> 1) Distribution shipping application specific profiles that were
> developed back when the new VFS mount API wasn't around, those
> profiles allow the specific mount calls made by the application. The
> application has since moved on to using the VFS mount API and thanks
> to the lack of apparmor coverage, things have kept on working fine.
> Now when the distro pulls this bugfix, the application will now fail
> to perform the mount. Apparmor returns EACCESS which doesn't cause a
> fallback to the old mount API but instead a straight up error returned
> to the user. ENOSYS would usually be handled properly.
> 
I am not opposed to conditionally returning ENOSYS. I also have no doubt
that doing so would break fewer applications, it will still break some.
As I said I will look into doing that

> 2) All the container managers out there which use AppArmor as a safety
> net to block mostly misbehaving or misconfigured software. Those
> policies are not meant to be water tight to begin with (and due to the
> nature of containers, they really can't be). But they do catch a lot
> and while not an actual security barrier, they certainly prevent a lot
> of accidents. Unfortunately with the new mediation in place, the only
> real way to operate moving forward is to not mediate mounts at all as
> mediation of the new VFS API leads to just about everything getting
> denied and apparmor doesn't provide a way to separately mediate the
> old and new mount APIs.
> 
atm no, I am working on a conditional for detached mounts. I am not
convinced we should completely separate move_mount() mediation for
attached mounts.

But to use any of that you will need a new apparmor userspace to
support it. LXD is using the mounts as a behavioral/advisory catch, so
allowing an unmediated move_mount() is not the end of the world. Apparmor
still needs to be able to mediate mounts for applications that aren't
containers and do need a tighter barrier.

The reality with the new mount api is that atm we can only mediate
move_mount(), but the new mount api does add a twist with the whole
detached mounts. The only way to support that on older releases is
using a very generic mount rule.

New userspaces can pick up more. Leaving move_mount() unmediated even
for older releases really isn't an appealing option. New kernels
get pulled back and used on old releases all the time.

> 3) A system where apparmor is used to effectively prevent any mounting
> whatsoever, where the policy is now being bypassed by simply using the
> new VFS API. Such a system would also need to not be using Seccomp in
> combination with Apparmor as blocking the new VFS API syscalls would
> avoid the issue.
> 
sure

> 
> I suspect 3) is what John is hinting at in this thread. I could
> certainly see this bug be used to bypass the snapd mount restrictions
> for example, though snapd also generates seccomp policies, so just
> blocking the new VFS API would be a much simpler fix there.
> 
I am concerned about more than the snapd case, yes snapd could and
should use seccomp, but apparmor should be able to stand on its own
for other cases.

> I have no idea how common 1) is, but it looks like we're about to find
> out soon and that has potential for some very "interesting" bug
> reports as mount related errors are usually not the easiest to
> understand and LSMs make them so much worse.
> 
unfortunately that is always the case when fixing a mediation regression.
This one sat way too long that doesn't mean it shouldn't get fixed.

Overall though, I think lxd/incus is probably the major one. Most people
are running targeted policies and have left most of the stuff doing
mounts unconfined.

> As for 2), as it stands, we're going to have to effectively turn off
> any of the mount related safety nets we had in place in LXC, Incus and
> likely most other container runtimes out there to handle this. The
> usual issue with doing that kind of heavy hammer change is that once
> it's done, it will take a long time to undo it. That is, once AppArmor
> is actually capable of mediating the way it's supposed to, a lot of
> the profiles will have been changed to just blanket allow all mounts
> and they're only ever going to be changed once we're confident that
> the vast majority of our users are running a kernel with the fixed
> logic.
> 
I am well aware of it. I fall on the side of leaving this completely
unmediated is slightly worse, but I can understand why someone
would choose to leave this unmediated.

> I guess some of that could be handled better if there was a way to
> detect the current broken mediation of the new mount API and then
> later again, detect that the kernel now has working mediation,
> allowing for those container managers to generate accurate profile
> rather than have to go for the "safest" option (as far as keeping
> users running) and just keep allowing everything.
> 
> John, is there any such detection mechanism currently present that
> could be used by userspace to better handle this situation?
> 
sadly not for the move_mount() patch, it is however a one line patch

diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c
index 7465c39ad1bc..6cd52767cfeb 100644
--- a/security/apparmor/apparmorfs.c
+++ b/security/apparmor/apparmorfs.c
@@ -2373,6 +2373,7 @@ static struct aa_sfs_entry aa_sfs_entry_policy[] = {
  
  static struct aa_sfs_entry aa_sfs_entry_mount[] = {
  	AA_SFS_FILE_STRING("mask", "mount umount pivot_root"),
+	AA_SFS_FILE_STRING(move_mount, "detached"),
  	{ }
  };
  
or similar. That I could send out today.

it would surfaced the via the regular securityfs/apparmor interface as
/sys/kernel/security/apparmor/features/mount/move_mount

I could also get you a simple conditional patch around returning ENOSYS
to start testing, so that we could potentially try sending a final version
of that patch up next week.

> 
> Overall, this still feels very rushed and broken to me. I understand
> that being able to trivially bypass AppArmor mount rules isn't great,
> but shipping code that makes the vast majority of said rules useless
> doesn't really feel like such an improvement. It's effectively turning
> what was a reasonably flexible policy engine around mount calls, into
> a binary switch to either allow everything or block everything.
> 
I get what you are saying. The other side of the coin is oh look apparmor
isn't mediating mount if I just do this ...

> We at the very least need a way to check whether we're dealing with
> this known broken state so that any change that's made to userspace
> can be made condition on this, ensuring that once mediation actually
> works as intended, those policies also go back to the way they're
> supposed to be.
> 
agreed. That really should have been part of the patch to begin with.



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: Apparmor move_mount mediation breaks mount tool in containers
  2023-12-05 19:55           ` John Johansen
@ 2023-12-06  2:18             ` Stéphane Graber
  2023-12-06  3:16               ` John Johansen
  2024-01-02 22:09               ` John Johansen
  0 siblings, 2 replies; 17+ messages in thread
From: Stéphane Graber @ 2023-12-06  2:18 UTC (permalink / raw)
  To: John Johansen
  Cc: Linux regressions mailing list, Christian Brauner, Sasha Levin,
	Aleksa Sarai, Alexander Mihalicyn, paul, James Morris,
	Serge Hallyn, linux-security-module

On Tue, Dec 5, 2023 at 2:55 PM John Johansen
<john.johansen@canonical.com> wrote:
>
> On 12/4/23 22:57, Stéphane Graber wrote:
> > On Mon, Dec 4, 2023 at 9:20 AM Linux regression tracking (Thorsten
> > Leemhuis) <regressions@leemhuis.info> wrote:
> >>
> >> On 04.12.23 14:14, John Johansen wrote:
> >>> On 12/3/23 17:34, Stéphane Graber wrote:
> >>
> >> Side note: Stéphane, thx for CCing the regressions list.
> >>
> >>>> On Sun, Dec 3, 2023 at 8:21 PM John Johansen
> >>>> <john.johansen@canonical.com> wrote:
> >>>>> On 12/2/23 17:20, Stéphane Graber wrote:
> >>>>>>
> >>>>>> Upstream commit 157a3537d6bc28ceb9a11fc8cb67f2152d860146 which just
> >>>>>> landed in 6.6.3 stable as 96af45154a0be30485ad07f70f852b1456cb13d7 is
> >>>>>> blocking new mounts for all LXC, LXD and Incus users (at least) on
> >>>>>> distributions using the newer version of util-linux.
> >>>>>>
> >>>>>> That's because for a simple mount like "mount -t tmpfs tmpfs /tmp",
> >>>>>> the new mount command now performs:
> >>>>>> ```
> >>>>>> fsconfig(3, FSCONFIG_SET_STRING, "source", "tmpfs", 0) = 0
> >>>>>> fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = 0
> >>>>>> fsmount(3, FSMOUNT_CLOEXEC, 0)          = 4
> >>>>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_MNT_ID,
> >>>>>> {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID,
> >>>>>> stx_attributes=STATX_ATTR_MOUNT_ROOT, stx_mode=S_IFDIR|S_ISVTX|0777,
> >>>>>> stx_size=40, ...}) = 0
> >>>>>> move_mount(4, "", AT_FDCWD, "/tmp", MOVE_MOUNT_F_EMPTY_PATH) = 0
> >>>>>> ```
> >>>>>>
> >>>>>> That last call to "move_mount" is incorrectly interpreted by AppArmor
> >>>>>> as an attempt to move-mount "/" to "/mnt" rather than as a new mount
> >>>>>> being created, this therefore results in:
> >>>>>> ```
> >>>>>> Dec 03 01:05:03 kernel-test kernel: audit: type=1400
> >>>>>> audit(1701565503.599:34): apparmor="DENIED" operation="mount"
> >>>>>> class="mount" info="failed perms check" error=-13
> >>>>>> profile="incus-a_</var/lib/incus>" name="/tmp/" pid=2190 comm="mount"
> >>>>>> srcname="/" flags="rw, move"
> >>>>>> ```
> >>>>>>
> >>>>>> Note that the flags here show "move", the fstype isn't even set and
> >>>>>> the source path at srcname incorrectly shows "/".
> >>>>>>
> >>>>>> This operation therefore trips any container manager which has an
> >>>>>> apparmor security policy preventing arbitrary move-mount (as those
> >>>>>> could be used to bypass other apparmor path based policies).
> >>>>>>
> >>>>>>
> >>>>>> The way I see it, the current mediation support effectively breaks any
> >>>>>> attempt at mediating mounts in a useful way in apparmor as it's now
> >>>>>> impossible to mediate new mounts based on their fstype or even
> >>>>>> distinguish them from a move-mount operation.
> >>>>>>
> >>>>> Indeed it is a far from good solution. It is a stop gap.
> >>>>>>
> >>>>>> I don't know if this warrants pulling the mediation patch out of
> >>>>>> stable (and out of linus' tree), obviously doing that would
> >>>>>> reintroduce that hole in mount coverage, but at the same time, the
> >>>>>> current coverage is broken enough that our only alternative is to
> >>>>>> effectively allow all mounts, making the current mediation useless.
> >>>>>
> >>>>> pulling it effectively means ALL applications by-pass mediation, the
> >>>>> alternative is to block all applications from using the move_mount
> >>>>> system call as part of mediation. Which might have been acceptable
> >>>>> as a stop gap when the syscall first landed but not now.
> >>>>
> >>>> Pulling it from the stable branch may still make sense, you now have
> >>>> folks who are updating to get actual bugfixes and end up with broken
> >>>> containers, that doesn't exactly seem like a good outcome...
> >>>
> >>> I will defer such a decision to the maintainers the stable trees. I
> >>> can see arguments either way.
> >>
> >> FWIW, Greg usually does not revert a backported change if the change
> >> causes the same problem to happen with mainline, as then it should be
> >> fixed there as well. Which is normally the right thing to do, as Linus
> >> wants people to be able to upgrade to new kernels without having them to
> >> upgrade anything else (firmware, anything in userland incl. apparmor
> >> policies).
> >>
> >> So normally this whole issue afaics would be "157a3537d6bc28 needs to be
> >> fixed in mainline (if nothing else helps with a revert), and that fix
> >> then needs to be backported to the various stable trees".
> >>
> >> It obviously is more complicated because that commit apparently fixes a
> >> security vulnerability. But even under such circumstances Linus afaik
> >> wants us to do everything possible to avoid breaking peoples workflows.
> >> Which is why I wonder if there is a more elegant solution here
> >> somewhere. I doubt that the answer is yes, but I'll ask the following
> >> anyway: Would a kernel config option that distros could set when they
> >> updated their Apparmor policies help here? Or something in sysfs?
> >>
> >> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> >
> > So there are a few different scenarios here:
> >
> > 1) Distribution shipping application specific profiles that were
> > developed back when the new VFS mount API wasn't around, those
> > profiles allow the specific mount calls made by the application. The
> > application has since moved on to using the VFS mount API and thanks
> > to the lack of apparmor coverage, things have kept on working fine.
> > Now when the distro pulls this bugfix, the application will now fail
> > to perform the mount. Apparmor returns EACCESS which doesn't cause a
> > fallback to the old mount API but instead a straight up error returned
> > to the user. ENOSYS would usually be handled properly.
> >
> I am not opposed to conditionally returning ENOSYS. I also have no doubt
> that doing so would break fewer applications, it will still break some.
> As I said I will look into doing that

Thanks, that would certainly be a much better stop gap measure until
such time as the mount API can properly be mediated by AppArmor.

> > 2) All the container managers out there which use AppArmor as a safety
> > net to block mostly misbehaving or misconfigured software. Those
> > policies are not meant to be water tight to begin with (and due to the
> > nature of containers, they really can't be). But they do catch a lot
> > and while not an actual security barrier, they certainly prevent a lot
> > of accidents. Unfortunately with the new mediation in place, the only
> > real way to operate moving forward is to not mediate mounts at all as
> > mediation of the new VFS API leads to just about everything getting
> > denied and apparmor doesn't provide a way to separately mediate the
> > old and new mount APIs.
> >
> atm no, I am working on a conditional for detached mounts. I am not
> convinced we should completely separate move_mount() mediation for
> attached mounts.
>
> But to use any of that you will need a new apparmor userspace to
> support it. LXD is using the mounts as a behavioral/advisory catch, so
> allowing an unmediated move_mount() is not the end of the world. Apparmor
> still needs to be able to mediate mounts for applications that aren't
> containers and do need a tighter barrier.
>
> The reality with the new mount api is that atm we can only mediate
> move_mount(), but the new mount api does add a twist with the whole
> detached mounts. The only way to support that on older releases is
> using a very generic mount rule.
>
> New userspaces can pick up more. Leaving move_mount() unmediated even
> for older releases really isn't an appealing option. New kernels
> get pulled back and used on old releases all the time.
>
> > 3) A system where apparmor is used to effectively prevent any mounting
> > whatsoever, where the policy is now being bypassed by simply using the
> > new VFS API. Such a system would also need to not be using Seccomp in
> > combination with Apparmor as blocking the new VFS API syscalls would
> > avoid the issue.
> >
> sure
>
> >
> > I suspect 3) is what John is hinting at in this thread. I could
> > certainly see this bug be used to bypass the snapd mount restrictions
> > for example, though snapd also generates seccomp policies, so just
> > blocking the new VFS API would be a much simpler fix there.
> >
> I am concerned about more than the snapd case, yes snapd could and
> should use seccomp, but apparmor should be able to stand on its own
> for other cases.
>
> > I have no idea how common 1) is, but it looks like we're about to find
> > out soon and that has potential for some very "interesting" bug
> > reports as mount related errors are usually not the easiest to
> > understand and LSMs make them so much worse.
> >
> unfortunately that is always the case when fixing a mediation regression.
> This one sat way too long that doesn't mean it shouldn't get fixed.
>
> Overall though, I think lxd/incus is probably the major one. Most people
> are running targeted policies and have left most of the stuff doing
> mounts unconfined.

We did some research with Aleksa last night and that does seem to be
true for the profiles we could find.

However there is a bit of a catch that Docker, Kubernetes, ... pretty
much all the application container options do support using AppArmor
but don't provide advanced profiles, instead they put the burden on
the user to attach profiles to specific containers.

I have no idea how many people actually do that and since they are
individually crafted profiles by those users, we have no idea what may
be in there and whether they got broken by this change.

> > As for 2), as it stands, we're going to have to effectively turn off
> > any of the mount related safety nets we had in place in LXC, Incus and
> > likely most other container runtimes out there to handle this. The
> > usual issue with doing that kind of heavy hammer change is that once
> > it's done, it will take a long time to undo it. That is, once AppArmor
> > is actually capable of mediating the way it's supposed to, a lot of
> > the profiles will have been changed to just blanket allow all mounts
> > and they're only ever going to be changed once we're confident that
> > the vast majority of our users are running a kernel with the fixed
> > logic.
> >
> I am well aware of it. I fall on the side of leaving this completely
> unmediated is slightly worse, but I can understand why someone
> would choose to leave this unmediated.
>
> > I guess some of that could be handled better if there was a way to
> > detect the current broken mediation of the new mount API and then
> > later again, detect that the kernel now has working mediation,
> > allowing for those container managers to generate accurate profile
> > rather than have to go for the "safest" option (as far as keeping
> > users running) and just keep allowing everything.
> >
> > John, is there any such detection mechanism currently present that
> > could be used by userspace to better handle this situation?
> >
> sadly not for the move_mount() patch, it is however a one line patch
>
> diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c
> index 7465c39ad1bc..6cd52767cfeb 100644
> --- a/security/apparmor/apparmorfs.c
> +++ b/security/apparmor/apparmorfs.c
> @@ -2373,6 +2373,7 @@ static struct aa_sfs_entry aa_sfs_entry_policy[] = {
>
>   static struct aa_sfs_entry aa_sfs_entry_mount[] = {
>         AA_SFS_FILE_STRING("mask", "mount umount pivot_root"),
> +       AA_SFS_FILE_STRING(move_mount, "detached"),
>         { }
>   };
>
> or similar. That I could send out today.

I think that'd be useful to have as something that can be used to
detect this behavior from userspace and have the affected code turn
off mount mediation if that's the case.

> it would surfaced the via the regular securityfs/apparmor interface as
> /sys/kernel/security/apparmor/features/mount/move_mount

> I could also get you a simple conditional patch around returning ENOSYS
> to start testing, so that we could potentially try sending a final version
> of that patch up next week.

That'd be good. I feel that if this change had come with the ENOSYS
behavior, I wouldn't have noticed it at all.
With the new mount API being still pretty fresh, I'm not aware of any
userspace which today relies exclusively on it, so having it be
effectively disabled until such time as you can mediate it properly
shouldn't really break anyone (always hard to know, but sure would
break a lot less cases than the current change).

One catch though from my testing with seccomp, to make this work, you
need to have fsmount return ENOSYS, if you only have move_mount return
ENOSYS, it's too late and libmount just returns the failure to the
user rather than go down the old mount API code path.

> > Overall, this still feels very rushed and broken to me. I understand
> > that being able to trivially bypass AppArmor mount rules isn't great,
> > but shipping code that makes the vast majority of said rules useless
> > doesn't really feel like such an improvement. It's effectively turning
> > what was a reasonably flexible policy engine around mount calls, into
> > a binary switch to either allow everything or block everything.
> >
> I get what you are saying. The other side of the coin is oh look apparmor
> isn't mediating mount if I just do this ...

Sure, but I suspect we could have avoided a bunch of pain if we had a
chat around this change, had an opportunity to test it ahead of time.
At the very least, we'd have had the flag file introduced from the
start and hopefully some better handling like some kind of ENOSYS
option.

> > We at the very least need a way to check whether we're dealing with
> > this known broken state so that any change that's made to userspace
> > can be made condition on this, ensuring that once mediation actually
> > works as intended, those policies also go back to the way they're
> > supposed to be.
> >
> agreed. That really should have been part of the patch to begin with.

Stéphane

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Apparmor move_mount mediation breaks mount tool in containers
  2023-12-06  2:18             ` Stéphane Graber
@ 2023-12-06  3:16               ` John Johansen
  2024-01-02 22:09               ` John Johansen
  1 sibling, 0 replies; 17+ messages in thread
From: John Johansen @ 2023-12-06  3:16 UTC (permalink / raw)
  To: Stéphane Graber
  Cc: Linux regressions mailing list, Christian Brauner, Sasha Levin,
	Aleksa Sarai, Alexander Mihalicyn, paul, James Morris,
	Serge Hallyn, linux-security-module

On 12/5/23 18:18, Stéphane Graber wrote:
> On Tue, Dec 5, 2023 at 2:55 PM John Johansen
> <john.johansen@canonical.com> wrote:
>>
>> On 12/4/23 22:57, Stéphane Graber wrote:
>>> On Mon, Dec 4, 2023 at 9:20 AM Linux regression tracking (Thorsten
>>> Leemhuis) <regressions@leemhuis.info> wrote:
>>>>
>>>> On 04.12.23 14:14, John Johansen wrote:
>>>>> On 12/3/23 17:34, Stéphane Graber wrote:
>>>>
>>>> Side note: Stéphane, thx for CCing the regressions list.
>>>>
>>>>>> On Sun, Dec 3, 2023 at 8:21 PM John Johansen
>>>>>> <john.johansen@canonical.com> wrote:
>>>>>>> On 12/2/23 17:20, Stéphane Graber wrote:
>>>>>>>>
>>>>>>>> Upstream commit 157a3537d6bc28ceb9a11fc8cb67f2152d860146 which just
>>>>>>>> landed in 6.6.3 stable as 96af45154a0be30485ad07f70f852b1456cb13d7 is
>>>>>>>> blocking new mounts for all LXC, LXD and Incus users (at least) on
>>>>>>>> distributions using the newer version of util-linux.
>>>>>>>>
>>>>>>>> That's because for a simple mount like "mount -t tmpfs tmpfs /tmp",
>>>>>>>> the new mount command now performs:
>>>>>>>> ```
>>>>>>>> fsconfig(3, FSCONFIG_SET_STRING, "source", "tmpfs", 0) = 0
>>>>>>>> fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = 0
>>>>>>>> fsmount(3, FSMOUNT_CLOEXEC, 0)          = 4
>>>>>>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_MNT_ID,
>>>>>>>> {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID,
>>>>>>>> stx_attributes=STATX_ATTR_MOUNT_ROOT, stx_mode=S_IFDIR|S_ISVTX|0777,
>>>>>>>> stx_size=40, ...}) = 0
>>>>>>>> move_mount(4, "", AT_FDCWD, "/tmp", MOVE_MOUNT_F_EMPTY_PATH) = 0
>>>>>>>> ```
>>>>>>>>
>>>>>>>> That last call to "move_mount" is incorrectly interpreted by AppArmor
>>>>>>>> as an attempt to move-mount "/" to "/mnt" rather than as a new mount
>>>>>>>> being created, this therefore results in:
>>>>>>>> ```
>>>>>>>> Dec 03 01:05:03 kernel-test kernel: audit: type=1400
>>>>>>>> audit(1701565503.599:34): apparmor="DENIED" operation="mount"
>>>>>>>> class="mount" info="failed perms check" error=-13
>>>>>>>> profile="incus-a_</var/lib/incus>" name="/tmp/" pid=2190 comm="mount"
>>>>>>>> srcname="/" flags="rw, move"
>>>>>>>> ```
>>>>>>>>
>>>>>>>> Note that the flags here show "move", the fstype isn't even set and
>>>>>>>> the source path at srcname incorrectly shows "/".
>>>>>>>>
>>>>>>>> This operation therefore trips any container manager which has an
>>>>>>>> apparmor security policy preventing arbitrary move-mount (as those
>>>>>>>> could be used to bypass other apparmor path based policies).
>>>>>>>>
>>>>>>>>
>>>>>>>> The way I see it, the current mediation support effectively breaks any
>>>>>>>> attempt at mediating mounts in a useful way in apparmor as it's now
>>>>>>>> impossible to mediate new mounts based on their fstype or even
>>>>>>>> distinguish them from a move-mount operation.
>>>>>>>>
>>>>>>> Indeed it is a far from good solution. It is a stop gap.
>>>>>>>>
>>>>>>>> I don't know if this warrants pulling the mediation patch out of
>>>>>>>> stable (and out of linus' tree), obviously doing that would
>>>>>>>> reintroduce that hole in mount coverage, but at the same time, the
>>>>>>>> current coverage is broken enough that our only alternative is to
>>>>>>>> effectively allow all mounts, making the current mediation useless.
>>>>>>>
>>>>>>> pulling it effectively means ALL applications by-pass mediation, the
>>>>>>> alternative is to block all applications from using the move_mount
>>>>>>> system call as part of mediation. Which might have been acceptable
>>>>>>> as a stop gap when the syscall first landed but not now.
>>>>>>
>>>>>> Pulling it from the stable branch may still make sense, you now have
>>>>>> folks who are updating to get actual bugfixes and end up with broken
>>>>>> containers, that doesn't exactly seem like a good outcome...
>>>>>
>>>>> I will defer such a decision to the maintainers the stable trees. I
>>>>> can see arguments either way.
>>>>
>>>> FWIW, Greg usually does not revert a backported change if the change
>>>> causes the same problem to happen with mainline, as then it should be
>>>> fixed there as well. Which is normally the right thing to do, as Linus
>>>> wants people to be able to upgrade to new kernels without having them to
>>>> upgrade anything else (firmware, anything in userland incl. apparmor
>>>> policies).
>>>>
>>>> So normally this whole issue afaics would be "157a3537d6bc28 needs to be
>>>> fixed in mainline (if nothing else helps with a revert), and that fix
>>>> then needs to be backported to the various stable trees".
>>>>
>>>> It obviously is more complicated because that commit apparently fixes a
>>>> security vulnerability. But even under such circumstances Linus afaik
>>>> wants us to do everything possible to avoid breaking peoples workflows.
>>>> Which is why I wonder if there is a more elegant solution here
>>>> somewhere. I doubt that the answer is yes, but I'll ask the following
>>>> anyway: Would a kernel config option that distros could set when they
>>>> updated their Apparmor policies help here? Or something in sysfs?
>>>>
>>>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>>>
>>> So there are a few different scenarios here:
>>>
>>> 1) Distribution shipping application specific profiles that were
>>> developed back when the new VFS mount API wasn't around, those
>>> profiles allow the specific mount calls made by the application. The
>>> application has since moved on to using the VFS mount API and thanks
>>> to the lack of apparmor coverage, things have kept on working fine.
>>> Now when the distro pulls this bugfix, the application will now fail
>>> to perform the mount. Apparmor returns EACCESS which doesn't cause a
>>> fallback to the old mount API but instead a straight up error returned
>>> to the user. ENOSYS would usually be handled properly.
>>>
>> I am not opposed to conditionally returning ENOSYS. I also have no doubt
>> that doing so would break fewer applications, it will still break some.
>> As I said I will look into doing that
> 
> Thanks, that would certainly be a much better stop gap measure until
> such time as the mount API can properly be mediated by AppArmor.
> 
>>> 2) All the container managers out there which use AppArmor as a safety
>>> net to block mostly misbehaving or misconfigured software. Those
>>> policies are not meant to be water tight to begin with (and due to the
>>> nature of containers, they really can't be). But they do catch a lot
>>> and while not an actual security barrier, they certainly prevent a lot
>>> of accidents. Unfortunately with the new mediation in place, the only
>>> real way to operate moving forward is to not mediate mounts at all as
>>> mediation of the new VFS API leads to just about everything getting
>>> denied and apparmor doesn't provide a way to separately mediate the
>>> old and new mount APIs.
>>>
>> atm no, I am working on a conditional for detached mounts. I am not
>> convinced we should completely separate move_mount() mediation for
>> attached mounts.
>>
>> But to use any of that you will need a new apparmor userspace to
>> support it. LXD is using the mounts as a behavioral/advisory catch, so
>> allowing an unmediated move_mount() is not the end of the world. Apparmor
>> still needs to be able to mediate mounts for applications that aren't
>> containers and do need a tighter barrier.
>>
>> The reality with the new mount api is that atm we can only mediate
>> move_mount(), but the new mount api does add a twist with the whole
>> detached mounts. The only way to support that on older releases is
>> using a very generic mount rule.
>>
>> New userspaces can pick up more. Leaving move_mount() unmediated even
>> for older releases really isn't an appealing option. New kernels
>> get pulled back and used on old releases all the time.
>>
>>> 3) A system where apparmor is used to effectively prevent any mounting
>>> whatsoever, where the policy is now being bypassed by simply using the
>>> new VFS API. Such a system would also need to not be using Seccomp in
>>> combination with Apparmor as blocking the new VFS API syscalls would
>>> avoid the issue.
>>>
>> sure
>>
>>>
>>> I suspect 3) is what John is hinting at in this thread. I could
>>> certainly see this bug be used to bypass the snapd mount restrictions
>>> for example, though snapd also generates seccomp policies, so just
>>> blocking the new VFS API would be a much simpler fix there.
>>>
>> I am concerned about more than the snapd case, yes snapd could and
>> should use seccomp, but apparmor should be able to stand on its own
>> for other cases.
>>
>>> I have no idea how common 1) is, but it looks like we're about to find
>>> out soon and that has potential for some very "interesting" bug
>>> reports as mount related errors are usually not the easiest to
>>> understand and LSMs make them so much worse.
>>>
>> unfortunately that is always the case when fixing a mediation regression.
>> This one sat way too long that doesn't mean it shouldn't get fixed.
>>
>> Overall though, I think lxd/incus is probably the major one. Most people
>> are running targeted policies and have left most of the stuff doing
>> mounts unconfined.
> 
> We did some research with Aleksa last night and that does seem to be
> true for the profiles we could find.
> 
> However there is a bit of a catch that Docker, Kubernetes, ... pretty
> much all the application container options do support using AppArmor
> but don't provide advanced profiles, instead they put the burden on
> the user to attach profiles to specific containers.
> 
> I have no idea how many people actually do that and since they are
> individually crafted profiles by those users, we have no idea what may
> be in there and whether they got broken by this change.
> 
>>> As for 2), as it stands, we're going to have to effectively turn off
>>> any of the mount related safety nets we had in place in LXC, Incus and
>>> likely most other container runtimes out there to handle this. The
>>> usual issue with doing that kind of heavy hammer change is that once
>>> it's done, it will take a long time to undo it. That is, once AppArmor
>>> is actually capable of mediating the way it's supposed to, a lot of
>>> the profiles will have been changed to just blanket allow all mounts
>>> and they're only ever going to be changed once we're confident that
>>> the vast majority of our users are running a kernel with the fixed
>>> logic.
>>>
>> I am well aware of it. I fall on the side of leaving this completely
>> unmediated is slightly worse, but I can understand why someone
>> would choose to leave this unmediated.
>>
>>> I guess some of that could be handled better if there was a way to
>>> detect the current broken mediation of the new mount API and then
>>> later again, detect that the kernel now has working mediation,
>>> allowing for those container managers to generate accurate profile
>>> rather than have to go for the "safest" option (as far as keeping
>>> users running) and just keep allowing everything.
>>>
>>> John, is there any such detection mechanism currently present that
>>> could be used by userspace to better handle this situation?
>>>
>> sadly not for the move_mount() patch, it is however a one line patch
>>
>> diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c
>> index 7465c39ad1bc..6cd52767cfeb 100644
>> --- a/security/apparmor/apparmorfs.c
>> +++ b/security/apparmor/apparmorfs.c
>> @@ -2373,6 +2373,7 @@ static struct aa_sfs_entry aa_sfs_entry_policy[] = {
>>
>>    static struct aa_sfs_entry aa_sfs_entry_mount[] = {
>>          AA_SFS_FILE_STRING("mask", "mount umount pivot_root"),
>> +       AA_SFS_FILE_STRING(move_mount, "detached"),
>>          { }
>>    };
>>
>> or similar. That I could send out today.
> 
> I think that'd be useful to have as something that can be used to
> detect this behavior from userspace and have the affected code turn
> off mount mediation if that's the case.
> 
>> it would surfaced the via the regular securityfs/apparmor interface as
>> /sys/kernel/security/apparmor/features/mount/move_mount
> 
>> I could also get you a simple conditional patch around returning ENOSYS
>> to start testing, so that we could potentially try sending a final version
>> of that patch up next week.
> 
> That'd be good. I feel that if this change had come with the ENOSYS
> behavior, I wouldn't have noticed it at all.
> With the new mount API being still pretty fresh, I'm not aware of any
> userspace which today relies exclusively on it, so having it be
> effectively disabled until such time as you can mediate it properly
> shouldn't really break anyone (always hard to know, but sure would
> break a lot less cases than the current change).
> 
> One catch though from my testing with seccomp, to make this work, you
> need to have fsmount return ENOSYS, if you only have move_mount return
> ENOSYS, it's too late and libmount just returns the failure to the
> user rather than go down the old mount API code path.
> 

ugh, that is a problem. we don't have a hook in fsmount(), the closest
we get is the dentry_open() hook, but I don't see how a call to that
could reasonably detect that its coming from fsmount.

I am not opposed to a new hook at all, and I will throw something together
but I don't see that going into 6.7

>>> Overall, this still feels very rushed and broken to me. I understand
>>> that being able to trivially bypass AppArmor mount rules isn't great,
>>> but shipping code that makes the vast majority of said rules useless
>>> doesn't really feel like such an improvement. It's effectively turning
>>> what was a reasonably flexible policy engine around mount calls, into
>>> a binary switch to either allow everything or block everything.
>>>
>> I get what you are saying. The other side of the coin is oh look apparmor
>> isn't mediating mount if I just do this ...
> 
> Sure, but I suspect we could have avoided a bunch of pain if we had a
> chat around this change, had an opportunity to test it ahead of time.
> At the very least, we'd have had the flag file introduced from the
> start and hopefully some better handling like some kind of ENOSYS
> option.
> 
Sadly, yes I would have loved more testing, and I should have thought to
to poke you directly. I deliberately delayed sending this up during last
cycle when the report to kernel.org came in. I kicked a kernel with it over
to lxd (while still treating the issue as being in embargo. sigh, ...), and
then put it into the ubuntu proposed kernels and apparmor-next.

heard plenty on the user namespace mediation, but nothing of significance
around move_mount.

>>> We at the very least need a way to check whether we're dealing with
>>> this known broken state so that any change that's made to userspace
>>> can be made condition on this, ensuring that once mediation actually
>>> works as intended, those policies also go back to the way they're
>>> supposed to be.
>>>
>> agreed. That really should have been part of the patch to begin with.
> 
> Stéphane


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Apparmor move_mount mediation breaks mount tool in containers
  2023-12-05 18:34               ` John Johansen
@ 2023-12-06 14:12                 ` Christian Brauner
  2023-12-06 19:21                   ` John Johansen
  0 siblings, 1 reply; 17+ messages in thread
From: Christian Brauner @ 2023-12-06 14:12 UTC (permalink / raw)
  To: John Johansen
  Cc: Linux regressions mailing list, Stéphane Graber,
	Sasha Levin, Aleksa Sarai, Alexander Mihalicyn, paul,
	James Morris, Serge Hallyn, linux-security-module,
	Linus Torvalds

On Tue, Dec 05, 2023 at 10:34:33AM -0800, John Johansen wrote:
> On 12/5/23 09:08, Christian Brauner wrote:
> > On Tue, Dec 05, 2023 at 09:45:35AM +0100, Linux regression tracking (Thorsten Leemhuis) wrote:
> > > [CCing Linus]
> > > 
> > > Hi, top-posting for once, to make this easily accessible to everyone.
> > > 
> > > Stéphane, many thx for your insights.
> > > 
> > > I'm CCing Linus on this, as I guess he wants to be aware of this.
> > > 
> > > Normally I try to sum up things somewhat when doing so, but this here
> > > likely won't end well. So all I'm saying is: commit 157a3537d6bc28
> > > ("apparmor: Fix regression in mount mediation") [v6.7-rc1] that was also
> > > included in v6.6.3 broke containers in some setups. John described that
> > > commit with the words "it is a far from good solution. It is a stop
> > > gap." and later mentioned that "a CVE is coming for this and that having
> > > this unmediated in the tree isn't good either."
> > > 
> > > For more details please check out the thread that on lore started with
> > > this message:
> > > 
> > > https://lore.kernel.org/all/CA+enf=u0UmgjKrd98EYkxFu7FYV8dR1SBYJn_1b0Naq=3twbbQ@mail.gmail.com/
> > > 
> > > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> > > --
> > > Everything you wanna know about Linux kernel regression tracking:
> > > https://linux-regtracking.leemhuis.info/about/#tldr
> > > If I did something stupid, please tell me, as explained on that page.
> > > 
> > > On 05.12.23 07:57, Stéphane Graber wrote:
> > > > On Mon, Dec 4, 2023 at 9:20 AM Linux regression tracking (Thorsten
> > > > Leemhuis) <regressions@leemhuis.info> wrote:
> > > > > 
> > > > > On 04.12.23 14:14, John Johansen wrote:
> > > > > > On 12/3/23 17:34, Stéphane Graber wrote:
> > > > > 
> > > > > Side note: Stéphane, thx for CCing the regressions list.
> > > > > 
> > > > > > > On Sun, Dec 3, 2023 at 8:21 PM John Johansen
> > > > > > > <john.johansen@canonical.com> wrote:
> > > > > > > > On 12/2/23 17:20, Stéphane Graber wrote:
> > > > > > > > > 
> > > > > > > > > Upstream commit 157a3537d6bc28ceb9a11fc8cb67f2152d860146 which just
> > > > > > > > > landed in 6.6.3 stable as 96af45154a0be30485ad07f70f852b1456cb13d7 is
> > > > > > > > > blocking new mounts for all LXC, LXD and Incus users (at least) on
> > > > > > > > > distributions using the newer version of util-linux.
> > > > > > > > > 
> > > > > > > > > That's because for a simple mount like "mount -t tmpfs tmpfs /tmp",
> > > > > > > > > the new mount command now performs:
> > > > > > > > > ```
> > > > > > > > > fsconfig(3, FSCONFIG_SET_STRING, "source", "tmpfs", 0) = 0
> > > > > > > > > fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = 0
> > > > > > > > > fsmount(3, FSMOUNT_CLOEXEC, 0)          = 4
> > > > > > > > > statx(4, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_MNT_ID,
> > > > > > > > > {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID,
> > > > > > > > > stx_attributes=STATX_ATTR_MOUNT_ROOT, stx_mode=S_IFDIR|S_ISVTX|0777,
> > > > > > > > > stx_size=40, ...}) = 0
> > > > > > > > > move_mount(4, "", AT_FDCWD, "/tmp", MOVE_MOUNT_F_EMPTY_PATH) = 0
> > > > > > > > > ```
> > > > > > > > > 
> > > > > > > > > That last call to "move_mount" is incorrectly interpreted by AppArmor
> > > > > > > > > as an attempt to move-mount "/" to "/mnt" rather than as a new mount
> > > > > > > > > being created, this therefore results in:
> > > > > > > > > ```
> > > > > > > > > Dec 03 01:05:03 kernel-test kernel: audit: type=1400
> > > > > > > > > audit(1701565503.599:34): apparmor="DENIED" operation="mount"
> > > > > > > > > class="mount" info="failed perms check" error=-13
> > > > > > > > > profile="incus-a_</var/lib/incus>" name="/tmp/" pid=2190 comm="mount"
> > > > > > > > > srcname="/" flags="rw, move"
> > > > > > > > > ```
> > 
> > move_mount() doesn't create a new mount it just moves an already
> > existing mount around in the tree. Either that mount is moved
> > (fsmount(), open_tree(OPEN_TREE_CLONE)) from it's source location or it
> > was a detached mount to begin with.
> > 
> yes, all to aware of this
> 
> > The new mount api splits mounting across multiple system calls. And the
> > state it keeps for itself is gone once you create a mount via fsmount().
> 
> yep
> 
> > Because then the fs_context will be destroyed and you're only dealing
> > with a superblock and mount(s) for it.
> > 
> yep
> 
> > You shouldn't care about move_mount() moving a detached mount because
> > really it is not a move from some other filesystem location. The
> > creation of that mount will have already happened in
> > open_tree(OPEN_TREE_CLONE) or in fsmount(). So that ship has sailed and
> > move_mount() is the wrong place to do this.
> > 
> yes and no. What we are specifically dong is mediating existing attached
> mounts as always and blocking move_mount from attaching the detached
> mount via move_mount. Which is very much something move_mount mediation
> should be able to do.

Yes, but my point is that it's often at the wrong point in time. You
can mediate this earlier as well - see my open_tree() comment below.

> 
> It works as a stop gap for the new mount api bypassing the LSM at a very
> course level. This is why it requires a very general mount rule for
> apparmor to allow move_mount of the detached mount. A conditional that
> improves control around this is coming, but we can't make move_mount()
> mediation provide the full set of apparmor old mount mediation.
> 
> Providing something close to what we were doing before is going to
> require new hooks, and likely specialized states tailored to the
> new mount api. Until that does happen it does mean a reduction in
> what apparmor policy can mediate. Today instead of being able to
> specify details about the mount you need a generic mount rule that
> just allows the application to do pretty much anything with mounts,
> soon you will be able to have an addition conditional that allows
> better control of move mount, and better logging of what is going on.

If you want to start doing full mediation for the new mount api you're
likely missing quite a few pieces. 

I expect you need a hook in at least fsmount() as this always creates a
detached mount for a new filesystem context (You touched on that in an
earlier mail.).

But you also need a new hook in open_tree() but _only_ in the
OPEN_TREE_CLONE path because that creates detached mounts as well.

And you will also very likely want a new hook in mount_setattr() because
that allows you to change mount properties as well and the creation of
idmapped mounts and so on.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Apparmor move_mount mediation breaks mount tool in containers
  2023-12-06 14:12                 ` Christian Brauner
@ 2023-12-06 19:21                   ` John Johansen
  0 siblings, 0 replies; 17+ messages in thread
From: John Johansen @ 2023-12-06 19:21 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Linux regressions mailing list, Stéphane Graber,
	Sasha Levin, Aleksa Sarai, Alexander Mihalicyn, paul,
	James Morris, Serge Hallyn, linux-security-module,
	Linus Torvalds

On 12/6/23 06:12, Christian Brauner wrote:
> On Tue, Dec 05, 2023 at 10:34:33AM -0800, John Johansen wrote:
>> On 12/5/23 09:08, Christian Brauner wrote:
>>> On Tue, Dec 05, 2023 at 09:45:35AM +0100, Linux regression tracking (Thorsten Leemhuis) wrote:
>>>> [CCing Linus]
>>>>
>>>> Hi, top-posting for once, to make this easily accessible to everyone.
>>>>
>>>> Stéphane, many thx for your insights.
>>>>
>>>> I'm CCing Linus on this, as I guess he wants to be aware of this.
>>>>
>>>> Normally I try to sum up things somewhat when doing so, but this here
>>>> likely won't end well. So all I'm saying is: commit 157a3537d6bc28
>>>> ("apparmor: Fix regression in mount mediation") [v6.7-rc1] that was also
>>>> included in v6.6.3 broke containers in some setups. John described that
>>>> commit with the words "it is a far from good solution. It is a stop
>>>> gap." and later mentioned that "a CVE is coming for this and that having
>>>> this unmediated in the tree isn't good either."
>>>>
>>>> For more details please check out the thread that on lore started with
>>>> this message:
>>>>
>>>> https://lore.kernel.org/all/CA+enf=u0UmgjKrd98EYkxFu7FYV8dR1SBYJn_1b0Naq=3twbbQ@mail.gmail.com/
>>>>
>>>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>>>> --
>>>> Everything you wanna know about Linux kernel regression tracking:
>>>> https://linux-regtracking.leemhuis.info/about/#tldr
>>>> If I did something stupid, please tell me, as explained on that page.
>>>>
>>>> On 05.12.23 07:57, Stéphane Graber wrote:
>>>>> On Mon, Dec 4, 2023 at 9:20 AM Linux regression tracking (Thorsten
>>>>> Leemhuis) <regressions@leemhuis.info> wrote:
>>>>>>
>>>>>> On 04.12.23 14:14, John Johansen wrote:
>>>>>>> On 12/3/23 17:34, Stéphane Graber wrote:
>>>>>>
>>>>>> Side note: Stéphane, thx for CCing the regressions list.
>>>>>>
>>>>>>>> On Sun, Dec 3, 2023 at 8:21 PM John Johansen
>>>>>>>> <john.johansen@canonical.com> wrote:
>>>>>>>>> On 12/2/23 17:20, Stéphane Graber wrote:
>>>>>>>>>>
>>>>>>>>>> Upstream commit 157a3537d6bc28ceb9a11fc8cb67f2152d860146 which just
>>>>>>>>>> landed in 6.6.3 stable as 96af45154a0be30485ad07f70f852b1456cb13d7 is
>>>>>>>>>> blocking new mounts for all LXC, LXD and Incus users (at least) on
>>>>>>>>>> distributions using the newer version of util-linux.
>>>>>>>>>>
>>>>>>>>>> That's because for a simple mount like "mount -t tmpfs tmpfs /tmp",
>>>>>>>>>> the new mount command now performs:
>>>>>>>>>> ```
>>>>>>>>>> fsconfig(3, FSCONFIG_SET_STRING, "source", "tmpfs", 0) = 0
>>>>>>>>>> fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = 0
>>>>>>>>>> fsmount(3, FSMOUNT_CLOEXEC, 0)          = 4
>>>>>>>>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_MNT_ID,
>>>>>>>>>> {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID,
>>>>>>>>>> stx_attributes=STATX_ATTR_MOUNT_ROOT, stx_mode=S_IFDIR|S_ISVTX|0777,
>>>>>>>>>> stx_size=40, ...}) = 0
>>>>>>>>>> move_mount(4, "", AT_FDCWD, "/tmp", MOVE_MOUNT_F_EMPTY_PATH) = 0
>>>>>>>>>> ```
>>>>>>>>>>
>>>>>>>>>> That last call to "move_mount" is incorrectly interpreted by AppArmor
>>>>>>>>>> as an attempt to move-mount "/" to "/mnt" rather than as a new mount
>>>>>>>>>> being created, this therefore results in:
>>>>>>>>>> ```
>>>>>>>>>> Dec 03 01:05:03 kernel-test kernel: audit: type=1400
>>>>>>>>>> audit(1701565503.599:34): apparmor="DENIED" operation="mount"
>>>>>>>>>> class="mount" info="failed perms check" error=-13
>>>>>>>>>> profile="incus-a_</var/lib/incus>" name="/tmp/" pid=2190 comm="mount"
>>>>>>>>>> srcname="/" flags="rw, move"
>>>>>>>>>> ```
>>>
>>> move_mount() doesn't create a new mount it just moves an already
>>> existing mount around in the tree. Either that mount is moved
>>> (fsmount(), open_tree(OPEN_TREE_CLONE)) from it's source location or it
>>> was a detached mount to begin with.
>>>
>> yes, all to aware of this
>>
>>> The new mount api splits mounting across multiple system calls. And the
>>> state it keeps for itself is gone once you create a mount via fsmount().
>>
>> yep
>>
>>> Because then the fs_context will be destroyed and you're only dealing
>>> with a superblock and mount(s) for it.
>>>
>> yep
>>
>>> You shouldn't care about move_mount() moving a detached mount because
>>> really it is not a move from some other filesystem location. The
>>> creation of that mount will have already happened in
>>> open_tree(OPEN_TREE_CLONE) or in fsmount(). So that ship has sailed and
>>> move_mount() is the wrong place to do this.
>>>
>> yes and no. What we are specifically dong is mediating existing attached
>> mounts as always and blocking move_mount from attaching the detached
>> mount via move_mount. Which is very much something move_mount mediation
>> should be able to do.
> 
> Yes, but my point is that it's often at the wrong point in time. You
> can mediate this earlier as well - see my open_tree() comment below.
> 

yes, there needs to be mediation happening earlier as well. The goal
here wasn't complete new mount api mediation, which is something that
we have been meaning to address but haven't been able to yet.

This was specifically about addressing move_mount() which can also
be used to move existing mounts in the system. While fixing that the
question is do we care about detached mounts, yes, and should we
address them to the point that we can within the current mediation,
that is to gate them on very course is mount mediation allowed, and from
a security perspective that is a yes too.

Under the new mount api, even if we are mediating at the other points
brought up below, we still want to be mediating the move_mount
because apparmor very much cares where these mounts end up in
the mount tree.


>>
>> It works as a stop gap for the new mount api bypassing the LSM at a very
>> course level. This is why it requires a very general mount rule for
>> apparmor to allow move_mount of the detached mount. A conditional that
>> improves control around this is coming, but we can't make move_mount()
>> mediation provide the full set of apparmor old mount mediation.
>>
>> Providing something close to what we were doing before is going to
>> require new hooks, and likely specialized states tailored to the
>> new mount api. Until that does happen it does mean a reduction in
>> what apparmor policy can mediate. Today instead of being able to
>> specify details about the mount you need a generic mount rule that
>> just allows the application to do pretty much anything with mounts,
>> soon you will be able to have an addition conditional that allows
>> better control of move mount, and better logging of what is going on.
> 
> If you want to start doing full mediation for the new mount api you're
> likely missing quite a few pieces.
> 
> I expect you need a hook in at least fsmount() as this always creates a
> detached mount for a new filesystem context (You touched on that in an
> earlier mail.).
> 
yes

> But you also need a new hook in open_tree() but _only_ in the
> OPEN_TREE_CLONE path because that creates detached mounts as well.
> 
yes

> And you will also very likely want a new hook in mount_setattr() because
> that allows you to change mount properties as well and the creation of
> idmapped mounts and so on.

agreed.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Apparmor move_mount mediation breaks mount tool in containers
  2023-12-05 12:24     ` Linux regression tracking #adding (Thorsten Leemhuis)
@ 2023-12-23  8:17       ` Linux regression tracking #update (Thorsten Leemhuis)
  0 siblings, 0 replies; 17+ messages in thread
From: Linux regression tracking #update (Thorsten Leemhuis) @ 2023-12-23  8:17 UTC (permalink / raw)
  To: regressions

On 05.12.23 13:24, Linux regression tracking #adding (Thorsten Leemhuis)
wrote:
> [TLDR: This mail in primarily relevant for Linux kernel regression
> tracking. See link in footer if these mails annoy you.]
> 
> Below issue is already discussed, but to ensure it does not fall through
> the cracks let me add this to the tracking:
> 
> #regzbot ^introduced 157a3537d6bc28ceb9a11fc8cb67f2152d
> #regzbot title apparmor: move_mount mediation breaks mount tool in
> containers
> #regzbot ignore-activity

FWIW, I pointed Linus to this thread in two regression reports and he
didn't react publicly. So I for now assume things are fine from his
point of view:

#regzbot resolve: looks like things are fine for Linus as they are
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

> On 04.12.23 02:34, Stéphane Graber wrote:
>> On Sun, Dec 3, 2023 at 8:21 PM John Johansen
>> <john.johansen@canonical.com> wrote:
>>>
>>> On 12/2/23 17:20, Stéphane Graber wrote:
>>>> Hey John,
>>>>
>>>> Upstream commit 157a3537d6bc28ceb9a11fc8cb67f2152d860146 which just
>>>> landed in 6.6.3 stable as 96af45154a0be30485ad07f70f852b1456cb13d7 is
>>>> blocking new mounts for all LXC, LXD and Incus users (at least) on
>>>> distributions using the newer version of util-linux.
>>>>
>>>> That's because for a simple mount like "mount -t tmpfs tmpfs /tmp",
>>>> the new mount command now performs:
>>>> ```
>>>> fsconfig(3, FSCONFIG_SET_STRING, "source", "tmpfs", 0) = 0
>>>> fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = 0
>>>> fsmount(3, FSMOUNT_CLOEXEC, 0)          = 4
>>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_MNT_ID,
>>>> {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID,
>>>> stx_attributes=STATX_ATTR_MOUNT_ROOT, stx_mode=S_IFDIR|S_ISVTX|0777,
>>>> stx_size=40, ...}) = 0
>>>> move_mount(4, "", AT_FDCWD, "/tmp", MOVE_MOUNT_F_EMPTY_PATH) = 0
>>>> ```
>>>>
>>>> That last call to "move_mount" is incorrectly interpreted by AppArmor
>>>> as an attempt to move-mount "/" to "/mnt" rather than as a new mount
>>>> being created, this therefore results in:
>>>> ```
>>>> Dec 03 01:05:03 kernel-test kernel: audit: type=1400
>>>> audit(1701565503.599:34): apparmor="DENIED" operation="mount"
>>>> class="mount" info="failed perms check" error=-13
>>>> profile="incus-a_</var/lib/incus>" name="/tmp/" pid=2190 comm="mount"
>>>> srcname="/" flags="rw, move"
>>>> ```
>>>>
>>>> Note that the flags here show "move", the fstype isn't even set and
>>>> the source path at srcname incorrectly shows "/".
>>>>
>>>> This operation therefore trips any container manager which has an
>>>> apparmor security policy preventing arbitrary move-mount (as those
>>>> could be used to bypass other apparmor path based policies).
>>>>
>>>>
>>>> The way I see it, the current mediation support effectively breaks any
>>>> attempt at mediating mounts in a useful way in apparmor as it's now
>>>> impossible to mediate new mounts based on their fstype or even
>>>> distinguish them from a move-mount operation.
>>>>
>>> Indeed it is a far from good solution. It is a stop gap.
>>>>
>>>> I don't know if this warrants pulling the mediation patch out of
>>>> stable (and out of linus' tree), obviously doing that would
>>>> reintroduce that hole in mount coverage, but at the same time, the
>>>> current coverage is broken enough that our only alternative is to
>>>> effectively allow all mounts, making the current mediation useless.
>>>
>>> pulling it effectively means ALL applications by-pass mediation, the
>>> alternative is to block all applications from using the move_mount
>>> system call as part of mediation. Which might have been acceptable
>>> as a stop gap when the syscall first landed but not now.
>>
>> Pulling it from the stable branch may still make sense, you now have
>> folks who are updating to get actual bugfixes and end up with broken
>> containers, that doesn't exactly seem like a good outcome...
>>
>>> The situation can be broken down like this
>>> unconfined applications have no restrictions, this doesn't affect
>>> hem.
>>>
>>> confined applications using the old mount interface continue to
>>> work as expected.
>>>
>>> confined applications using the new mount interface now are blocked
>>> unless there is, admittedly, a very broad mount rule granted. This
>>> is however better than the old situation in which confined
>>> applications where not mediated at all.
>>
>> It depends from your point of view. If you are using apparmor around a
>> single application and have tailored rules for exactly what it is
>> doing, then yes, I agree.
>> But for all cases where apparmor is used as a safety net to block
>> potentially dangerous/problematic actions, this is causing the kind of
>> breakage which will cause people to just plain turn off apparmor.
>>
>>> The regression is in the policy, where an application like LXD/Incus
>>> are specify mount rules and loading policy based on it. Those
>>> restrictions continue to work on the old mount interface, they
>>> however do not work as one would expect when the new mount interface
>>> is used. To allow applications to use the new mount interface
>>> a broad mount rule is needed, which basically makes the other
>>> mount restrictions useless.
>>>
>>> This situation is however still better than without the patch
>>> because, applications trying to use the new mount interface
>>> were not restricted at all. The only regression is in mediation
>>> of applications that are using the old mount interface. But
>>> again those application, without the patch, can just use the
>>> new interface and by-pass the restrictions from those rules. In
>>> this case the broad rule is not good, but better than the by-pass.
>>>
>>> I am working on a patch that will improve the situation, over the
>>> current patch using the existing move_mount hook. I wanted to
>>> be able to get more testing in place, before we did more, and
>>> just didn't have the time.
>>
>> So just to understand, this change went upstream while it was known
>> that effectively no sane mount policy would work with it?
>>
>> That seems very odd, at that point, why not instead upstream a patch
>> that disables the new mount API when apparmor is in use with mount
>> rules?
>> That feels like it would have had the same theoretical benefits
>> without the breakage that this change caused.
>>
>>> But the reality is, apparmor won't be able to achieve equivalent
>>> mediation without new hooks or at least more mount context passed
>>> into the existing hook. That to is being looked at but won't be
>>> is even further out.
>>
>> That doesn't sound encouraging as far as getting a suitable fix to our users...
>>
>>
>> Is there even a way in apparmor policies to cleanly prevent the use of
>> the new mount API (return ENOSYS back to userspace)?
>> We can obviously do this through a separate seccomp filter, but not
>> everyone is using seccomp...
>>
>> Stéphane
>>
>>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Apparmor move_mount mediation breaks mount tool in containers
  2023-12-06  2:18             ` Stéphane Graber
  2023-12-06  3:16               ` John Johansen
@ 2024-01-02 22:09               ` John Johansen
  1 sibling, 0 replies; 17+ messages in thread
From: John Johansen @ 2024-01-02 22:09 UTC (permalink / raw)
  To: Stéphane Graber
  Cc: Linux regressions mailing list, Christian Brauner, Sasha Levin,
	Aleksa Sarai, Alexander Mihalicyn, paul, James Morris,
	Serge Hallyn, linux-security-module

On 12/5/23 18:18, Stéphane Graber wrote:
> On Tue, Dec 5, 2023 at 2:55 PM John Johansen
> <john.johansen@canonical.com> wrote:
>>
>> On 12/4/23 22:57, Stéphane Graber wrote:
>>> On Mon, Dec 4, 2023 at 9:20 AM Linux regression tracking (Thorsten
>>> Leemhuis) <regressions@leemhuis.info> wrote:
>>>>
>>>> On 04.12.23 14:14, John Johansen wrote:
>>>>> On 12/3/23 17:34, Stéphane Graber wrote:
>>>>
>>>> Side note: Stéphane, thx for CCing the regressions list.
>>>>
>>>>>> On Sun, Dec 3, 2023 at 8:21 PM John Johansen
>>>>>> <john.johansen@canonical.com> wrote:
>>>>>>> On 12/2/23 17:20, Stéphane Graber wrote:
>>>>>>>>
>>>>>>>> Upstream commit 157a3537d6bc28ceb9a11fc8cb67f2152d860146 which just
>>>>>>>> landed in 6.6.3 stable as 96af45154a0be30485ad07f70f852b1456cb13d7 is
>>>>>>>> blocking new mounts for all LXC, LXD and Incus users (at least) on
>>>>>>>> distributions using the newer version of util-linux.
>>>>>>>>
>>>>>>>> That's because for a simple mount like "mount -t tmpfs tmpfs /tmp",
>>>>>>>> the new mount command now performs:
>>>>>>>> ```
>>>>>>>> fsconfig(3, FSCONFIG_SET_STRING, "source", "tmpfs", 0) = 0
>>>>>>>> fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = 0
>>>>>>>> fsmount(3, FSMOUNT_CLOEXEC, 0)          = 4
>>>>>>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_MNT_ID,
>>>>>>>> {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID,
>>>>>>>> stx_attributes=STATX_ATTR_MOUNT_ROOT, stx_mode=S_IFDIR|S_ISVTX|0777,
>>>>>>>> stx_size=40, ...}) = 0
>>>>>>>> move_mount(4, "", AT_FDCWD, "/tmp", MOVE_MOUNT_F_EMPTY_PATH) = 0
>>>>>>>> ```
>>>>>>>>
>>>>>>>> That last call to "move_mount" is incorrectly interpreted by AppArmor
>>>>>>>> as an attempt to move-mount "/" to "/mnt" rather than as a new mount
>>>>>>>> being created, this therefore results in:
>>>>>>>> ```
>>>>>>>> Dec 03 01:05:03 kernel-test kernel: audit: type=1400
>>>>>>>> audit(1701565503.599:34): apparmor="DENIED" operation="mount"
>>>>>>>> class="mount" info="failed perms check" error=-13
>>>>>>>> profile="incus-a_</var/lib/incus>" name="/tmp/" pid=2190 comm="mount"
>>>>>>>> srcname="/" flags="rw, move"
>>>>>>>> ```
>>>>>>>>
>>>>>>>> Note that the flags here show "move", the fstype isn't even set and
>>>>>>>> the source path at srcname incorrectly shows "/".
>>>>>>>>
>>>>>>>> This operation therefore trips any container manager which has an
>>>>>>>> apparmor security policy preventing arbitrary move-mount (as those
>>>>>>>> could be used to bypass other apparmor path based policies).
>>>>>>>>
>>>>>>>>
>>>>>>>> The way I see it, the current mediation support effectively breaks any
>>>>>>>> attempt at mediating mounts in a useful way in apparmor as it's now
>>>>>>>> impossible to mediate new mounts based on their fstype or even
>>>>>>>> distinguish them from a move-mount operation.
>>>>>>>>
>>>>>>> Indeed it is a far from good solution. It is a stop gap.
>>>>>>>>
>>>>>>>> I don't know if this warrants pulling the mediation patch out of
>>>>>>>> stable (and out of linus' tree), obviously doing that would
>>>>>>>> reintroduce that hole in mount coverage, but at the same time, the
>>>>>>>> current coverage is broken enough that our only alternative is to
>>>>>>>> effectively allow all mounts, making the current mediation useless.
>>>>>>>
>>>>>>> pulling it effectively means ALL applications by-pass mediation, the
>>>>>>> alternative is to block all applications from using the move_mount
>>>>>>> system call as part of mediation. Which might have been acceptable
>>>>>>> as a stop gap when the syscall first landed but not now.
>>>>>>
>>>>>> Pulling it from the stable branch may still make sense, you now have
>>>>>> folks who are updating to get actual bugfixes and end up with broken
>>>>>> containers, that doesn't exactly seem like a good outcome...
>>>>>
>>>>> I will defer such a decision to the maintainers the stable trees. I
>>>>> can see arguments either way.
>>>>
>>>> FWIW, Greg usually does not revert a backported change if the change
>>>> causes the same problem to happen with mainline, as then it should be
>>>> fixed there as well. Which is normally the right thing to do, as Linus
>>>> wants people to be able to upgrade to new kernels without having them to
>>>> upgrade anything else (firmware, anything in userland incl. apparmor
>>>> policies).
>>>>
>>>> So normally this whole issue afaics would be "157a3537d6bc28 needs to be
>>>> fixed in mainline (if nothing else helps with a revert), and that fix
>>>> then needs to be backported to the various stable trees".
>>>>
>>>> It obviously is more complicated because that commit apparently fixes a
>>>> security vulnerability. But even under such circumstances Linus afaik
>>>> wants us to do everything possible to avoid breaking peoples workflows.
>>>> Which is why I wonder if there is a more elegant solution here
>>>> somewhere. I doubt that the answer is yes, but I'll ask the following
>>>> anyway: Would a kernel config option that distros could set when they
>>>> updated their Apparmor policies help here? Or something in sysfs?
>>>>
>>>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>>>
>>> So there are a few different scenarios here:
>>>
>>> 1) Distribution shipping application specific profiles that were
>>> developed back when the new VFS mount API wasn't around, those
>>> profiles allow the specific mount calls made by the application. The
>>> application has since moved on to using the VFS mount API and thanks
>>> to the lack of apparmor coverage, things have kept on working fine.
>>> Now when the distro pulls this bugfix, the application will now fail
>>> to perform the mount. Apparmor returns EACCESS which doesn't cause a
>>> fallback to the old mount API but instead a straight up error returned
>>> to the user. ENOSYS would usually be handled properly.
>>>
>> I am not opposed to conditionally returning ENOSYS. I also have no doubt
>> that doing so would break fewer applications, it will still break some.
>> As I said I will look into doing that
> 
> Thanks, that would certainly be a much better stop gap measure until
> such time as the mount API can properly be mediated by AppArmor.
> 
>>> 2) All the container managers out there which use AppArmor as a safety
>>> net to block mostly misbehaving or misconfigured software. Those
>>> policies are not meant to be water tight to begin with (and due to the
>>> nature of containers, they really can't be). But they do catch a lot
>>> and while not an actual security barrier, they certainly prevent a lot
>>> of accidents. Unfortunately with the new mediation in place, the only
>>> real way to operate moving forward is to not mediate mounts at all as
>>> mediation of the new VFS API leads to just about everything getting
>>> denied and apparmor doesn't provide a way to separately mediate the
>>> old and new mount APIs.
>>>
>> atm no, I am working on a conditional for detached mounts. I am not
>> convinced we should completely separate move_mount() mediation for
>> attached mounts.
>>
>> But to use any of that you will need a new apparmor userspace to
>> support it. LXD is using the mounts as a behavioral/advisory catch, so
>> allowing an unmediated move_mount() is not the end of the world. Apparmor
>> still needs to be able to mediate mounts for applications that aren't
>> containers and do need a tighter barrier.
>>
>> The reality with the new mount api is that atm we can only mediate
>> move_mount(), but the new mount api does add a twist with the whole
>> detached mounts. The only way to support that on older releases is
>> using a very generic mount rule.
>>
>> New userspaces can pick up more. Leaving move_mount() unmediated even
>> for older releases really isn't an appealing option. New kernels
>> get pulled back and used on old releases all the time.
>>
>>> 3) A system where apparmor is used to effectively prevent any mounting
>>> whatsoever, where the policy is now being bypassed by simply using the
>>> new VFS API. Such a system would also need to not be using Seccomp in
>>> combination with Apparmor as blocking the new VFS API syscalls would
>>> avoid the issue.
>>>
>> sure
>>
>>>
>>> I suspect 3) is what John is hinting at in this thread. I could
>>> certainly see this bug be used to bypass the snapd mount restrictions
>>> for example, though snapd also generates seccomp policies, so just
>>> blocking the new VFS API would be a much simpler fix there.
>>>
>> I am concerned about more than the snapd case, yes snapd could and
>> should use seccomp, but apparmor should be able to stand on its own
>> for other cases.
>>
>>> I have no idea how common 1) is, but it looks like we're about to find
>>> out soon and that has potential for some very "interesting" bug
>>> reports as mount related errors are usually not the easiest to
>>> understand and LSMs make them so much worse.
>>>
>> unfortunately that is always the case when fixing a mediation regression.
>> This one sat way too long that doesn't mean it shouldn't get fixed.
>>
>> Overall though, I think lxd/incus is probably the major one. Most people
>> are running targeted policies and have left most of the stuff doing
>> mounts unconfined.
> 
> We did some research with Aleksa last night and that does seem to be
> true for the profiles we could find.
> 
> However there is a bit of a catch that Docker, Kubernetes, ... pretty
> much all the application container options do support using AppArmor
> but don't provide advanced profiles, instead they put the burden on
> the user to attach profiles to specific containers.
> 
> I have no idea how many people actually do that and since they are
> individually crafted profiles by those users, we have no idea what may
> be in there and whether they got broken by this change.
> 
>>> As for 2), as it stands, we're going to have to effectively turn off
>>> any of the mount related safety nets we had in place in LXC, Incus and
>>> likely most other container runtimes out there to handle this. The
>>> usual issue with doing that kind of heavy hammer change is that once
>>> it's done, it will take a long time to undo it. That is, once AppArmor
>>> is actually capable of mediating the way it's supposed to, a lot of
>>> the profiles will have been changed to just blanket allow all mounts
>>> and they're only ever going to be changed once we're confident that
>>> the vast majority of our users are running a kernel with the fixed
>>> logic.
>>>
>> I am well aware of it. I fall on the side of leaving this completely
>> unmediated is slightly worse, but I can understand why someone
>> would choose to leave this unmediated.
>>
>>> I guess some of that could be handled better if there was a way to
>>> detect the current broken mediation of the new mount API and then
>>> later again, detect that the kernel now has working mediation,
>>> allowing for those container managers to generate accurate profile
>>> rather than have to go for the "safest" option (as far as keeping
>>> users running) and just keep allowing everything.
>>>
>>> John, is there any such detection mechanism currently present that
>>> could be used by userspace to better handle this situation?
>>>
>> sadly not for the move_mount() patch, it is however a one line patch
>>
>> diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c
>> index 7465c39ad1bc..6cd52767cfeb 100644
>> --- a/security/apparmor/apparmorfs.c
>> +++ b/security/apparmor/apparmorfs.c
>> @@ -2373,6 +2373,7 @@ static struct aa_sfs_entry aa_sfs_entry_policy[] = {
>>
>>    static struct aa_sfs_entry aa_sfs_entry_mount[] = {
>>          AA_SFS_FILE_STRING("mask", "mount umount pivot_root"),
>> +       AA_SFS_FILE_STRING(move_mount, "detached"),
>>          { }
>>    };
>>
>> or similar. That I could send out today.
> 
> I think that'd be useful to have as something that can be used to
> detect this behavior from userspace and have the affected code turn
> off mount mediation if that's the case.
> 
>> it would surfaced the via the regular securityfs/apparmor interface as
>> /sys/kernel/security/apparmor/features/mount/move_mount
> 
>> I could also get you a simple conditional patch around returning ENOSYS
>> to start testing, so that we could potentially try sending a final version
>> of that patch up next week.
> 
> That'd be good. I feel that if this change had come with the ENOSYS
> behavior, I wouldn't have noticed it at all.
> With the new mount API being still pretty fresh, I'm not aware of any
> userspace which today relies exclusively on it, so having it be
> effectively disabled until such time as you can mediate it properly
> shouldn't really break anyone (always hard to know, but sure would
> break a lot less cases than the current change).
> 
> One catch though from my testing with seccomp, to make this work, you
> need to have fsmount return ENOSYS, if you only have move_mount return
> ENOSYS, it's too late and libmount just returns the failure to the
> user rather than go down the old mount API code path.
> 
>>> Overall, this still feels very rushed and broken to me. I understand
>>> that being able to trivially bypass AppArmor mount rules isn't great,
>>> but shipping code that makes the vast majority of said rules useless
>>> doesn't really feel like such an improvement. It's effectively turning
>>> what was a reasonably flexible policy engine around mount calls, into
>>> a binary switch to either allow everything or block everything.
>>>
>> I get what you are saying. The other side of the coin is oh look apparmor
>> isn't mediating mount if I just do this ...
> 
> Sure, but I suspect we could have avoided a bunch of pain if we had a
> chat around this change, had an opportunity to test it ahead of time.
> At the very least, we'd have had the flag file introduced from the
> start and hopefully some better handling like some kind of ENOSYS
> option.
> 
>>> We at the very least need a way to check whether we're dealing with
>>> this known broken state so that any change that's made to userspace
>>> can be made condition on this, ensuring that once mediation actually
>>> works as intended, those policies also go back to the way they're
>>> supposed to be.
>>>
>> agreed. That really should have been part of the patch to begin with.
> 

So this is the best solution I could come up with so far. It detects the
mount is detached and keeps the attached_disconnected flag from attaching
it to /, which was the source for detached mounts showing up as /.

Instead of a new conditional that I was working on, this lets us use the
none existent source location as the conditional, allowing older
user spaces tools to support rules detached mounts. So rules like

   allow mount,
   allow mount -> /target/,

will allow the detached mounts, while a more specific mount rule
like
   allow mount /dev/sda1 -> /target/,

won't allow the detached mount to be moved in.

On older userpace compilers its a little ugly but you can specify
allowing detached mounts by using "" as the source

   allow mount "" -> /target/,

newer userspaces will pickup a keyword `detached` to make the rules
intent a little clearer.

   allow mount detached -> /target/,


Full mediation of the new mount interface is sadly still a wip.



commit 955d94150d59a63398f50e60632e1b3ea6a827a5
Author: John Johansen <john.johansen@canonical.com>
Date:   Mon Dec 18 01:10:03 2023 -0800

     apparmor: Detect that source mount of move_mount is detached
     
     Prevent move_mount from applying the attach_disconnected flag
     to move_mount(). This prevents detached mounts from appearing
     as / when applying mount mediation, which is not only incorrect
     but could result in bad policy being generated.
     
     Basic mount rules like
       allow mount,
       allow mount options=(move) -> /target/,
     
     will allow detached mounts, allowing older policy to continue
     to function. New policy gains the ability to specify `detached` as
     a source option
       allow mount detached -> /target/,
     
     In addition make sure support of move_mount is advertised as
     a feature to userspace so that applications that generate policy
     can respond to the addition.
     
     Note: this fixes mediation of move_mount, it does not fix the
     broader regression of apparmor mediation of the new mount
     api.
     
     Link: https://lore.kernel.org/all/68c166b8-5b4d-4612-8042-1dee3334385b@leemhuis.info/T/#mb35fdde37f999f08f0b02d58dc1bf4e6b65b8da2
     Fixes: 157a3537d6bc ("apparmor: Fix regression in mount mediation")
     Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com>
     Signed-off-by: John Johansen <john.johansen@canonical.com>

diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c
index 38650e52ef57..2d9f2a4b4519 100644
--- a/security/apparmor/apparmorfs.c
+++ b/security/apparmor/apparmorfs.c
@@ -2373,6 +2373,7 @@ static struct aa_sfs_entry aa_sfs_entry_policy[] = {
  
  static struct aa_sfs_entry aa_sfs_entry_mount[] = {
  	AA_SFS_FILE_STRING("mask", "mount umount pivot_root"),
+	AA_SFS_FILE_STRING("move_mount", "detached"),
  	{ }
  };
  
diff --git a/security/apparmor/mount.c b/security/apparmor/mount.c
index fb30204c761a..49fe8da6fea4 100644
--- a/security/apparmor/mount.c
+++ b/security/apparmor/mount.c
@@ -499,6 +499,10 @@ int aa_move_mount(const struct cred *subj_cred,
  	error = -ENOMEM;
  	if (!to_buffer || !from_buffer)
  		goto out;
+
+	if (!our_mnt(from_path->mnt))
+		/* moving a mount detached from the namespace */
+		from_path = NULL;
  	error = fn_for_each_confined(label, profile,
  			match_mnt(subj_cred, profile, to_path, to_buffer,
  				  from_path, from_buffer,



^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-01-02 22:09 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CA+enf=sWQ+-YP+uj9XfN_ykDsK=CYFFa35aPpeuS9B6qyLkjtg@mail.gmail.com>
     [not found] ` <582eb2e9-ce80-4f96-a4bc-bef1a508e0ab@canonical.com>
2023-12-04  1:34   ` Apparmor move_mount mediation breaks mount tool in containers Stéphane Graber
2023-12-04 13:14     ` John Johansen
2023-12-04 14:20       ` Linux regression tracking (Thorsten Leemhuis)
2023-12-05  6:57         ` Stéphane Graber
2023-12-05  8:45           ` Linux regression tracking (Thorsten Leemhuis)
2023-12-05 17:08             ` Christian Brauner
2023-12-05 18:34               ` John Johansen
2023-12-06 14:12                 ` Christian Brauner
2023-12-06 19:21                   ` John Johansen
2023-12-05 19:55           ` John Johansen
2023-12-06  2:18             ` Stéphane Graber
2023-12-06  3:16               ` John Johansen
2024-01-02 22:09               ` John Johansen
2023-12-04 19:39       ` Sasha Levin
2023-12-04 20:35         ` John Johansen
2023-12-05 12:24     ` Linux regression tracking #adding (Thorsten Leemhuis)
2023-12-23  8:17       ` Linux regression tracking #update (Thorsten Leemhuis)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.