All of lore.kernel.org
 help / color / mirror / Atom feed
* BTRFS losing SE Linux labels on power failure or "reboot -nffd"
@ 2018-06-04 13:14 Russell Coker
  2018-06-04 13:29 ` Hans van Kranenburg
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Russell Coker @ 2018-06-04 13:14 UTC (permalink / raw)
  To: linux-btrfs

The command "reboot -nffd" (kernel reboot without flushing kernel buffers or 
writing status) when run on a BTRFS system with SE Linux will often result in 
/var/log/audit/audit.log being unlabeled.  It also results in some systemd-
journald files like /var/log/journal/c195779d29154ed8bcb4e8444c4a1728/
system.journal being unlabeled but that is rarer.  I think that the same 
problem afflicts both systemd-journald and auditd but it's a race condition 
that on my systems (both production and test) is more likely to affect auditd.

root@stretch:/# xattr -l /var/log/audit/audit.log 
security.selinux: 
0000   73 79 73 74 65 6D 5F 75 3A 6F 62 6A 65 63 74 5F    system_u:object_ 
0010   72 3A 61 75 64 69 74 64 5F 6C 6F 67 5F 74 3A 73    r:auditd_log_t:s 
0020   30 00                                              0.

SE Linux uses the xattr "security.selinux", you can see what it's doing with 
xattr(1) but generally using "ls -Z" is easiest.

If this issue just affected "reboot -nffd" then a solution might be to just 
not run that command.  However this affects systems after a power outage.
 
I have reproduced this bug with kernel 4.9.0-6-amd64 (the latest security 
update for Debian/Stretch which is the latest supported release of Debian).  I 
have also reproduced it in an identical manner with kernel 4.16.0-1-amd64 (the 
latest from Debian/Unstable).  For testing I reproduced this with a 4G 
filesystem in a VM, but in production it has happened on BTRFS RAID-1 arrays, 
both SSD and HDD.
 
#!/bin/bash 
set -e 
COUNT=$(ps aux|grep [s]bin/auditd|wc -l) 
date 
if [ "$COUNT" = "1" ]; then 
 echo "all good" 
else 
 echo "failed" 
 exit 1 
fi

Firstly the above is the script /usr/local/sbin/testit, I test for auditd 
running because it aborts if the context on it's log file is wrong.  When SE 
Linux is in enforcing mode an incorrect/missing label on the audit.log file 
causes auditd to abort.
 
root@stretch:~# ls -liZ /var/log/audit/audit.log 
37952 -rw-------. 1 root root system_u:object_r:auditd_log_t:s0 4385230 Jun  1 
12:23 /var/log/audit/audit.log
Above is before I do the tests.
 
while ssh stretch /usr/local/sbin/testit ; do 
 ssh btrfs-local "reboot -nffd" > /dev/null 2>&1 & 
 sleep 20 
done
Above is the shell code I run to do the tests.  Note that the VM in question 
runs on SSD storage which is why it can consistently boot in less than 20 
seconds.
 
Fri  1 Jun 12:26:13 UTC 2018 
all good 
Fri  1 Jun 12:26:33 UTC 2018 
failed
Above is the output from the shell code in question.  After the first reboot 
it fails.  The probability of failure on my test system is greater than 50%.
 
root@stretch:~# ls -liZ /var/log/audit/audit.log  
37952 -rw-------. 1 root root system_u:object_r:unlabeled_t:s0 4396803 Jun  1 
12:26 /var/log/audit/audit.log
Now the result.  Note that the Inode has not changed.  I could understand a 
newly created file missing an xattr, but this is an existing file which 
shouldn't have had it's xattr changed.  But somehow it gets corrupted.
 
The first possibility I considered was that SE Linux code might be at fault.  
I asked on the SE Linux mailing list (I haven't been involved in SE Linux 
kernel code for about 15 years) and was informed that this isn't likely at 
all.  There have been no problems like this reported with other filesystems.
 
Does anyone have any ideas of other tests I should run?  Anyone want me to try 
a different kernel?  I can give root on a VM to anyone who wants to poke at 
it.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BTRFS losing SE Linux labels on power failure or "reboot -nffd"
  2018-06-04 13:14 BTRFS losing SE Linux labels on power failure or "reboot -nffd" Russell Coker
@ 2018-06-04 13:29 ` Hans van Kranenburg
  2018-06-04 13:59   ` Holger Hoffstätte
  2018-06-06 10:22 ` Russell Coker
  2018-06-06 10:59 ` Russell Coker
  2 siblings, 1 reply; 9+ messages in thread
From: Hans van Kranenburg @ 2018-06-04 13:29 UTC (permalink / raw)
  To: Russell Coker, linux-btrfs

Hi,

On 06/04/2018 03:14 PM, Russell Coker wrote:
> The command "reboot -nffd" (kernel reboot without flushing kernel buffers or 
> writing status) when run on a BTRFS system with SE Linux will often result in 
> /var/log/audit/audit.log being unlabeled.

This recent fix might be what you're looking for:

https://www.spinics.net/lists/linux-btrfs/msg77927.html

>  It also results in some systemd-
> journald files like /var/log/journal/c195779d29154ed8bcb4e8444c4a1728/
> system.journal being unlabeled but that is rarer.  I think that the same 
> problem afflicts both systemd-journald and auditd but it's a race condition 
> that on my systems (both production and test) is more likely to affect auditd.
> 
> root@stretch:/# xattr -l /var/log/audit/audit.log 
> security.selinux: 
> 0000   73 79 73 74 65 6D 5F 75 3A 6F 62 6A 65 63 74 5F    system_u:object_ 
> 0010   72 3A 61 75 64 69 74 64 5F 6C 6F 67 5F 74 3A 73    r:auditd_log_t:s 
> 0020   30 00                                              0.
> 
> SE Linux uses the xattr "security.selinux", you can see what it's doing with 
> xattr(1) but generally using "ls -Z" is easiest.
> 
> If this issue just affected "reboot -nffd" then a solution might be to just 
> not run that command.  However this affects systems after a power outage.
>  
> I have reproduced this bug with kernel 4.9.0-6-amd64 (the latest security 
> update for Debian/Stretch which is the latest supported release of Debian).  I 
> have also reproduced it in an identical manner with kernel 4.16.0-1-amd64 (the 
> latest from Debian/Unstable).  For testing I reproduced this with a 4G 
> filesystem in a VM, but in production it has happened on BTRFS RAID-1 arrays, 
> both SSD and HDD.
>  
> #!/bin/bash 
> set -e 
> COUNT=$(ps aux|grep [s]bin/auditd|wc -l) 
> date 
> if [ "$COUNT" = "1" ]; then 
>  echo "all good" 
> else 
>  echo "failed" 
>  exit 1 
> fi
> 
> Firstly the above is the script /usr/local/sbin/testit, I test for auditd 
> running because it aborts if the context on it's log file is wrong.  When SE 
> Linux is in enforcing mode an incorrect/missing label on the audit.log file 
> causes auditd to abort.
>  
> root@stretch:~# ls -liZ /var/log/audit/audit.log 
> 37952 -rw-------. 1 root root system_u:object_r:auditd_log_t:s0 4385230 Jun  1 
> 12:23 /var/log/audit/audit.log
> Above is before I do the tests.
>  
> while ssh stretch /usr/local/sbin/testit ; do 
>  ssh btrfs-local "reboot -nffd" > /dev/null 2>&1 & 
>  sleep 20 
> done
> Above is the shell code I run to do the tests.  Note that the VM in question 
> runs on SSD storage which is why it can consistently boot in less than 20 
> seconds.
>  
> Fri  1 Jun 12:26:13 UTC 2018 
> all good 
> Fri  1 Jun 12:26:33 UTC 2018 
> failed
> Above is the output from the shell code in question.  After the first reboot 
> it fails.  The probability of failure on my test system is greater than 50%.
>  
> root@stretch:~# ls -liZ /var/log/audit/audit.log  
> 37952 -rw-------. 1 root root system_u:object_r:unlabeled_t:s0 4396803 Jun  1 
> 12:26 /var/log/audit/audit.log
> Now the result.  Note that the Inode has not changed.  I could understand a 
> newly created file missing an xattr, but this is an existing file which 
> shouldn't have had it's xattr changed.  But somehow it gets corrupted.
>  
> The first possibility I considered was that SE Linux code might be at fault.  
> I asked on the SE Linux mailing list (I haven't been involved in SE Linux 
> kernel code for about 15 years) and was informed that this isn't likely at 
> all.  There have been no problems like this reported with other filesystems.
>  
> Does anyone have any ideas of other tests I should run?  Anyone want me to try 
> a different kernel?  I can give root on a VM to anyone who wants to poke at 
> it.
> 


-- 
Hans van Kranenburg

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BTRFS losing SE Linux labels on power failure or "reboot -nffd"
  2018-06-04 13:29 ` Hans van Kranenburg
@ 2018-06-04 13:59   ` Holger Hoffstätte
  0 siblings, 0 replies; 9+ messages in thread
From: Holger Hoffstätte @ 2018-06-04 13:59 UTC (permalink / raw)
  To: Hans van Kranenburg, Russell Coker, linux-btrfs

On 06/04/18 15:29, Hans van Kranenburg wrote:
> Hi,
> 
> On 06/04/2018 03:14 PM, Russell Coker wrote:
>> The command "reboot -nffd" (kernel reboot without flushing kernel buffers or
>> writing status) when run on a BTRFS system with SE Linux will often result in
>> /var/log/audit/audit.log being unlabeled.
> 
> This recent fix might be what you're looking for:
> 
> https://www.spinics.net/lists/linux-btrfs/msg77927.html

..which has been in 4.16 since 4.16.11. :)

-h

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BTRFS losing SE Linux labels on power failure or "reboot -nffd"
  2018-06-04 13:14 BTRFS losing SE Linux labels on power failure or "reboot -nffd" Russell Coker
  2018-06-04 13:29 ` Hans van Kranenburg
@ 2018-06-06 10:22 ` Russell Coker
  2018-06-06 10:59 ` Russell Coker
  2 siblings, 0 replies; 9+ messages in thread
From: Russell Coker @ 2018-06-06 10:22 UTC (permalink / raw)
  To: linux-btrfs

https://www.spinics.net/lists/linux-btrfs/msg77927.html

Thanks to Hans van Kranenburg and Holger Hoffstätte, the above has the link to 
a message with the patch for this which was already included in kernel 4.16.11 
which was uploaded to Debian on the 27th of May and got into testing about the 
time that my message got to the SE Linux list.

The kernel from Debian/Stable still has the issue.  So using a testing kernel 
might be a good option to deal with this problem at the moment.

On Monday, 4 June 2018 11:14:52 PM AEST Russell Coker wrote:
> The command "reboot -nffd" (kernel reboot without flushing kernel buffers or
> writing status) when run on a BTRFS system with SE Linux will often result
> in /var/log/audit/audit.log being unlabeled.  It also results in some
> systemd- journald files like
> /var/log/journal/c195779d29154ed8bcb4e8444c4a1728/ system.journal being
> unlabeled but that is rarer.  I think that the same problem afflicts both
> systemd-journald and auditd but it's a race condition that on my systems
> (both production and test) is more likely to affect auditd.
> 
> root@stretch:/# xattr -l /var/log/audit/audit.log
> security.selinux:
> 0000   73 79 73 74 65 6D 5F 75 3A 6F 62 6A 65 63 74 5F    system_u:object_
> 0010   72 3A 61 75 64 69 74 64 5F 6C 6F 67 5F 74 3A 73    r:auditd_log_t:s
> 0020   30 00                                              0.
> 
> SE Linux uses the xattr "security.selinux", you can see what it's doing with
> xattr(1) but generally using "ls -Z" is easiest.
> 
> If this issue just affected "reboot -nffd" then a solution might be to just
> not run that command.  However this affects systems after a power outage.
> 
> I have reproduced this bug with kernel 4.9.0-6-amd64 (the latest security
> update for Debian/Stretch which is the latest supported release of Debian). 
> I have also reproduced it in an identical manner with kernel 4.16.0-1-amd64
> (the latest from Debian/Unstable).  For testing I reproduced this with a 4G
> filesystem in a VM, but in production it has happened on BTRFS RAID-1
> arrays, both SSD and HDD.
> 
> #!/bin/bash
> set -e
> COUNT=$(ps aux|grep [s]bin/auditd|wc -l)
> date
> if [ "$COUNT" = "1" ]; then
>  echo "all good"
> else
>  echo "failed"
>  exit 1
> fi
> 
> Firstly the above is the script /usr/local/sbin/testit, I test for auditd
> running because it aborts if the context on it's log file is wrong.  When SE
> Linux is in enforcing mode an incorrect/missing label on the audit.log file
> causes auditd to abort.
> 
> root@stretch:~# ls -liZ /var/log/audit/audit.log
> 37952 -rw-------. 1 root root system_u:object_r:auditd_log_t:s0 4385230 Jun 
> 1 12:23 /var/log/audit/audit.log
> Above is before I do the tests.
> 
> while ssh stretch /usr/local/sbin/testit ; do
>  ssh btrfs-local "reboot -nffd" > /dev/null 2>&1 &
>  sleep 20
> done
> Above is the shell code I run to do the tests.  Note that the VM in question
> runs on SSD storage which is why it can consistently boot in less than 20
> seconds.
> 
> Fri  1 Jun 12:26:13 UTC 2018
> all good
> Fri  1 Jun 12:26:33 UTC 2018
> failed
> Above is the output from the shell code in question.  After the first reboot
> it fails.  The probability of failure on my test system is greater than
> 50%.
> 
> root@stretch:~# ls -liZ /var/log/audit/audit.log
> 37952 -rw-------. 1 root root system_u:object_r:unlabeled_t:s0 4396803 Jun 
> 1 12:26 /var/log/audit/audit.log
> Now the result.  Note that the Inode has not changed.  I could understand a
> newly created file missing an xattr, but this is an existing file which
> shouldn't have had it's xattr changed.  But somehow it gets corrupted.
> 
> The first possibility I considered was that SE Linux code might be at fault.
> I asked on the SE Linux mailing list (I haven't been involved in SE Linux
> kernel code for about 15 years) and was informed that this isn't likely at
> all.  There have been no problems like this reported with other
> filesystems.
> 
> Does anyone have any ideas of other tests I should run?  Anyone want me to
> try a different kernel?  I can give root on a VM to anyone who wants to
> poke at it.


-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BTRFS losing SE Linux labels on power failure or "reboot -nffd"
  2018-06-04 13:14 BTRFS losing SE Linux labels on power failure or "reboot -nffd" Russell Coker
  2018-06-04 13:29 ` Hans van Kranenburg
  2018-06-06 10:22 ` Russell Coker
@ 2018-06-06 10:59 ` Russell Coker
  2 siblings, 0 replies; 9+ messages in thread
From: Russell Coker @ 2018-06-06 10:59 UTC (permalink / raw)
  To: selinux

https://www.spinics.net/lists/linux-btrfs/msg77927.html

Thanks to Hans van Kranenburg and Holger Hoffstätte, the above has the link to 
a message with the patch for this which was already included in kernel 4.16.11 
which was uploaded to Debian on the 27th of May and got into testing about the 
time that my message got to the SE Linux list.

The kernel from Debian/Stable still has the issue.  So using a testing kernel 
might be a good option to deal with this problem at the moment.

On Monday, 4 June 2018 11:14:52 PM AEST Russell Coker wrote:
> The command "reboot -nffd" (kernel reboot without flushing kernel buffers or
> writing status) when run on a BTRFS system with SE Linux will often result
> in /var/log/audit/audit.log being unlabeled.  It also results in some
> systemd- journald files like
> /var/log/journal/c195779d29154ed8bcb4e8444c4a1728/ system.journal being
> unlabeled but that is rarer.  I think that the same problem afflicts both
> systemd-journald and auditd but it's a race condition that on my systems
> (both production and test) is more likely to affect auditd.
> 
> root@stretch:/# xattr -l /var/log/audit/audit.log
> security.selinux:
> 0000   73 79 73 74 65 6D 5F 75 3A 6F 62 6A 65 63 74 5F    system_u:object_
> 0010   72 3A 61 75 64 69 74 64 5F 6C 6F 67 5F 74 3A 73    r:auditd_log_t:s
> 0020   30 00                                              0.
> 
> SE Linux uses the xattr "security.selinux", you can see what it's doing with
> xattr(1) but generally using "ls -Z" is easiest.
> 
> If this issue just affected "reboot -nffd" then a solution might be to just
> not run that command.  However this affects systems after a power outage.
> 
> I have reproduced this bug with kernel 4.9.0-6-amd64 (the latest security
> update for Debian/Stretch which is the latest supported release of Debian). 
> I have also reproduced it in an identical manner with kernel 4.16.0-1-amd64
> (the latest from Debian/Unstable).  For testing I reproduced this with a 4G
> filesystem in a VM, but in production it has happened on BTRFS RAID-1
> arrays, both SSD and HDD.
> 
> #!/bin/bash
> set -e
> COUNT=$(ps aux|grep [s]bin/auditd|wc -l)
> date
> if [ "$COUNT" = "1" ]; then
>  echo "all good"
> else
>  echo "failed"
>  exit 1
> fi
> 
> Firstly the above is the script /usr/local/sbin/testit, I test for auditd
> running because it aborts if the context on it's log file is wrong.  When SE
> Linux is in enforcing mode an incorrect/missing label on the audit.log file
> causes auditd to abort.
> 
> root@stretch:~# ls -liZ /var/log/audit/audit.log
> 37952 -rw-------. 1 root root system_u:object_r:auditd_log_t:s0 4385230 Jun 
> 1 12:23 /var/log/audit/audit.log
> Above is before I do the tests.
> 
> while ssh stretch /usr/local/sbin/testit ; do
>  ssh btrfs-local "reboot -nffd" > /dev/null 2>&1 &
>  sleep 20
> done
> Above is the shell code I run to do the tests.  Note that the VM in question
> runs on SSD storage which is why it can consistently boot in less than 20
> seconds.
> 
> Fri  1 Jun 12:26:13 UTC 2018
> all good
> Fri  1 Jun 12:26:33 UTC 2018
> failed
> Above is the output from the shell code in question.  After the first reboot
> it fails.  The probability of failure on my test system is greater than
> 50%.
> 
> root@stretch:~# ls -liZ /var/log/audit/audit.log
> 37952 -rw-------. 1 root root system_u:object_r:unlabeled_t:s0 4396803 Jun 
> 1 12:26 /var/log/audit/audit.log
> Now the result.  Note that the Inode has not changed.  I could understand a
> newly created file missing an xattr, but this is an existing file which
> shouldn't have had it's xattr changed.  But somehow it gets corrupted.
> 
> The first possibility I considered was that SE Linux code might be at fault.
> I asked on the SE Linux mailing list (I haven't been involved in SE Linux
> kernel code for about 15 years) and was informed that this isn't likely at
> all.  There have been no problems like this reported with other
> filesystems.
> 
> Does anyone have any ideas of other tests I should run?  Anyone want me to
> try a different kernel?  I can give root on a VM to anyone who wants to
> poke at it.


-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BTRFS losing SE Linux labels on power failure or "reboot -nffd".
  2018-06-01 13:03 Russell Coker
  2018-06-02 18:18 ` Nick Kralevich
@ 2018-06-04 12:44 ` Stephen Smalley
  1 sibling, 0 replies; 9+ messages in thread
From: Stephen Smalley @ 2018-06-04 12:44 UTC (permalink / raw)
  To: Russell Coker, selinux

On 06/01/2018 09:03 AM, Russell Coker via Selinux wrote:
> The command "reboot -nffd" (kernel reboot without flushing kernel buffers or writing status) when run on a BTRFS system will often result in /var/log/audit/audit.log being unlabeled. It also results in some systemd-journald files like /var/log/journal/c195779d29154ed8bcb4e8444c4a1728/system.journal being unlabeled but that is rarer. I think that the same problem afflicts both systemd-journald and auditd but it's a race condition that on my systems (both production and test) is more likely to affect auditd.
> 
>  
> 
> If this issue just affected "reboot -nffd" then a solution might be to just not run that command. However this affects systems after a power outage.
> 
>  
> 
> I have reproduced this bug with kernel 4.9.0-6-amd64 (the latest security update for Debian/Stretch which is the latest supported release of Debian). I have also reported it in an identical manner with kernel 4.16.0-1-amd64 (the latest from Debian/Unstable). For testing I reproduced this with a 4G filesystem in a VM, but in production it has happened on BTRFS RAID-1 arrays, both SSD and HDD.
> 
>  
> 
> #!/bin/bash
> set -e
> COUNT=$(ps aux|grep [s]bin/auditd|wc -l)
> date
> if [ "$COUNT" = "1" ]; then
>  echo "all good"
> else
>  echo "failed"
>  exit 1
> fi
> 
> Firstly the above is the script /usr/local/sbin/testit, I test for auditd running because it aborts if the context on it's log file is wrong.
> 
>  
> 
> root@stretch:~# ls -liZ /var/log/audit/audit.log
> 37952 -rw-------. 1 root root system_u:object_r:auditd_log_t:s0 4385230 Jun  1 12:23 /var/log/audit/audit.log
> 
> Above is before I do the tests.
> 
>  
> 
> while ssh stretch /usr/local/sbin/testit ; do
>  ssh btrfs-local "reboot -nffd" > /dev/null 2>&1 &
>  sleep 20
> done
> 
> Above is the shell code I run to do the tests. Note that the VM in question runs on SSD storage which is why it can consistently boot in less than 20 seconds.
> 
>  
> 
> Fri  1 Jun 12:26:13 UTC 2018
> all good
> Fri  1 Jun 12:26:33 UTC 2018
> failed
> 
> Above is the output from the shell code in question. After the first reboot it fails. The probability of failure on my test system is greater than 50%.
> 
>  
> 
> root@stretch:~# ls -liZ /var/log/audit/audit.log  
> 37952 -rw-------. 1 root root system_u:object_r:unlabeled_t:s0 4396803 Jun  1 12:26 /var/log/audit/audit.log
> 
> Now the result. Note that the Inode has not changed. I could understand a newly created file missing an xattr, but this is an existing file which shouldn't have had it's xattr changed. But somehow it gets corrupted.
> 
>  
> 
> Could this be the fault of SE Linux code? I don't think it's likely but this is what the BTRFS developers will ask so it's best to discuss this here before sending it to them.

No, that's definitely a filesystem bug.  It is the filesystem's responsibility to ensure that new inodes are assigned a security.* xattr in the same transaction as the file creation (ext[234] does this, for example, e.g. via ext4_init_security()), and that they don't lose them.  SELinux just provides the xattr suffix ("selinux") and the value/value_len pair.

> 
>  
> 
> Does anyone have any ideas of other tests I should run? Anyone want me to try a different kernel? I can give root on a VM to anyone who wants to poke at it. Anything else I should add when sending this to the BTRFS developers?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BTRFS losing SE Linux labels on power failure or "reboot -nffd".
  2018-06-02 18:18 ` Nick Kralevich
@ 2018-06-03  7:20   ` Russell Coker
  0 siblings, 0 replies; 9+ messages in thread
From: Russell Coker @ 2018-06-03  7:20 UTC (permalink / raw)
  To: Nick Kralevich; +Cc: SELinux

On Sunday, 3 June 2018 4:18:09 AM AEST Nick Kralevich wrote:
> Does BTRFS have the equivalent of an fsck command which is run on
> boot? I've seen similar problems before where fsck tries fixing up the
> filesystem after an unclean shutdown, and the SELinux labels aren't
> properly handled by the fsck program.

https://en.wikipedia.org/wiki/Btrfs

BTRFS is designed to be self-healing and has no equivalent to the Ext2/3/4 
fsck program.  /bin/fsck.btrfs is a shell script that returns 8 if the device 
doesn't exist and 0 in all other cases, it has a comment saying "You should 
set fs_passno to 0".

The design of BTRFS is similar to ZFS in that you expect to be able to push 
reset and have the system just work again with kernel code doing the recovery.

It's definitely a kernel bug.  The question is where the bug is and who will 
fix it.

Thanks for the suggestion though.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BTRFS losing SE Linux labels on power failure or "reboot -nffd".
  2018-06-01 13:03 Russell Coker
@ 2018-06-02 18:18 ` Nick Kralevich
  2018-06-03  7:20   ` Russell Coker
  2018-06-04 12:44 ` Stephen Smalley
  1 sibling, 1 reply; 9+ messages in thread
From: Nick Kralevich @ 2018-06-02 18:18 UTC (permalink / raw)
  To: russell; +Cc: SELinux

Does BTRFS have the equivalent of an fsck command which is run on
boot? I've seen similar problems before where fsck tries fixing up the
filesystem after an unclean shutdown, and the SELinux labels aren't
properly handled by the fsck program.

I have no idea if this is related to your problem or not. Just more
food for thought.

-- Nick
On Sat, Jun 2, 2018 at 12:20 AM Russell Coker via Selinux
<selinux@tycho.nsa.gov> wrote:
>
> The command "reboot -nffd" (kernel reboot without flushing kernel buffers or writing status) when run on a BTRFS system will often result in /var/log/audit/audit.log being unlabeled. It also results in some systemd-journald files like /var/log/journal/c195779d29154ed8bcb4e8444c4a1728/system.journal being unlabeled but that is rarer. I think that the same problem afflicts both systemd-journald and auditd but it's a race condition that on my systems (both production and test) is more likely to affect auditd.
>
>
>
> If this issue just affected "reboot -nffd" then a solution might be to just not run that command. However this affects systems after a power outage.
>
>
>
> I have reproduced this bug with kernel 4.9.0-6-amd64 (the latest security update for Debian/Stretch which is the latest supported release of Debian). I have also reported it in an identical manner with kernel 4.16.0-1-amd64 (the latest from Debian/Unstable). For testing I reproduced this with a 4G filesystem in a VM, but in production it has happened on BTRFS RAID-1 arrays, both SSD and HDD.
>
>
>
> #!/bin/bash
> set -e
> COUNT=$(ps aux|grep [s]bin/auditd|wc -l)
> date
> if [ "$COUNT" = "1" ]; then
>  echo "all good"
> else
>  echo "failed"
>  exit 1
> fi
>
> Firstly the above is the script /usr/local/sbin/testit, I test for auditd running because it aborts if the context on it's log file is wrong.
>
>
>
> root@stretch:~# ls -liZ /var/log/audit/audit.log
> 37952 -rw-------. 1 root root system_u:object_r:auditd_log_t:s0 4385230 Jun  1 12:23 /var/log/audit/audit.log
>
> Above is before I do the tests.
>
>
>
> while ssh stretch /usr/local/sbin/testit ; do
>  ssh btrfs-local "reboot -nffd" > /dev/null 2>&1 &
>  sleep 20
> done
>
> Above is the shell code I run to do the tests. Note that the VM in question runs on SSD storage which is why it can consistently boot in less than 20 seconds.
>
>
>
> Fri  1 Jun 12:26:13 UTC 2018
> all good
> Fri  1 Jun 12:26:33 UTC 2018
> failed
>
> Above is the output from the shell code in question. After the first reboot it fails. The probability of failure on my test system is greater than 50%.
>
>
>
> root@stretch:~# ls -liZ /var/log/audit/audit.log
> 37952 -rw-------. 1 root root system_u:object_r:unlabeled_t:s0 4396803 Jun  1 12:26 /var/log/audit/audit.log
>
> Now the result. Note that the Inode has not changed. I could understand a newly created file missing an xattr, but this is an existing file which shouldn't have had it's xattr changed. But somehow it gets corrupted.
>
>
>
> Could this be the fault of SE Linux code? I don't think it's likely but this is what the BTRFS developers will ask so it's best to discuss this here before sending it to them.
>
>
>
> Does anyone have any ideas of other tests I should run? Anyone want me to try a different kernel? I can give root on a VM to anyone who wants to poke at it. Anything else I should add when sending this to the BTRFS developers?
>
>
>
> --
>
> My Main Blog http://etbe.coker.com.au/
>
> My Documents Blog http://doc.coker.com.au/
>
>
>
> _______________________________________________
> Selinux mailing list
> Selinux@tycho.nsa.gov
> To unsubscribe, send email to Selinux-leave@tycho.nsa.gov.
> To get help, send an email containing "help" to Selinux-request@tycho.nsa.gov.



-- 
Nick Kralevich | Fuchsia Security | nnk@google.com | 650.214.4037

^ permalink raw reply	[flat|nested] 9+ messages in thread

* BTRFS losing SE Linux labels on power failure or "reboot -nffd".
@ 2018-06-01 13:03 Russell Coker
  2018-06-02 18:18 ` Nick Kralevich
  2018-06-04 12:44 ` Stephen Smalley
  0 siblings, 2 replies; 9+ messages in thread
From: Russell Coker @ 2018-06-01 13:03 UTC (permalink / raw)
  To: selinux

[-- Attachment #1: Type: text/plain, Size: 2497 bytes --]

The command "reboot -nffd" (kernel reboot without flushing kernel buffers or writing 
status) when run on a BTRFS system will often result in /var/log/audit/audit.log being 
unlabeled.  It also results in some systemd-journald files like /var/log/journal/
c195779d29154ed8bcb4e8444c4a1728/system.journal being unlabeled but that is rarer.  I 
think that the same problem afflicts both systemd-journald and auditd but it's a race 
condition that on my systems (both production and test) is more likely to affect auditd.

If this issue just affected "reboot -nffd" then a solution might be to just not run that 
command.  However this affects systems after a power outage.

I have reproduced this bug with kernel 4.9.0-6-amd64 (the latest security update for 
Debian/Stretch which is the latest supported release of Debian).  I have also reported it in 
an identical manner with kernel 4.16.0-1-amd64 (the latest from Debian/Unstable).  For 
testing I reproduced this with a 4G filesystem in a VM, but in production it has happened 
on BTRFS RAID-1 arrays, both SSD and HDD.

#!/bin/bash 

Firstly the above is the script /usr/local/sbin/testit, I test for auditd running because it 
aborts if the context on it's log file is wrong.

root@stretch:~# ls -liZ /var/log/audit/audit.log 

Above is before I do the tests.

while ssh stretch /usr/local/sbin/testit ; do 

Above is the shell code I run to do the tests.  Note that the VM in question runs on SSD 
storage which is why it can consistently boot in less than 20 seconds.

Fri  1 Jun 12:26:13 UTC 2018 

Above is the output from the shell code in question.  After the first reboot it fails.  The 
probability of failure on my test system is greater than 50%.

root@stretch:~# ls -liZ /var/log/audit/audit.log  

Now the result.  Note that the Inode has not changed.  I could understand a newly created 
file missing an xattr, but this is an existing file which shouldn't have had it's xattr changed.  
But somehow it gets corrupted.

Could this be the fault of SE Linux code?  I don't think it's likely but this is what the BTRFS 
developers will ask so it's best to discuss this here before sending it to them.

Does anyone have any ideas of other tests I should run?  Anyone want me to try a 
different kernel?  I can give root on a VM to anyone who wants to poke at it.  Anything else 
I should add when sending this to the BTRFS developers?

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/


[-- Attachment #2: Type: text/html, Size: 9050 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-06-06 11:00 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-04 13:14 BTRFS losing SE Linux labels on power failure or "reboot -nffd" Russell Coker
2018-06-04 13:29 ` Hans van Kranenburg
2018-06-04 13:59   ` Holger Hoffstätte
2018-06-06 10:22 ` Russell Coker
2018-06-06 10:59 ` Russell Coker
  -- strict thread matches above, loose matches on Subject: below --
2018-06-01 13:03 Russell Coker
2018-06-02 18:18 ` Nick Kralevich
2018-06-03  7:20   ` Russell Coker
2018-06-04 12:44 ` Stephen Smalley

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.