All of lore.kernel.org
 help / color / mirror / Atom feed
* Failure in sel_netport_sid_slow()
@ 2016-04-13 18:00 Greg
  2016-04-13 19:10 ` Stephen Smalley
  0 siblings, 1 reply; 6+ messages in thread
From: Greg @ 2016-04-13 18:00 UTC (permalink / raw)
  To: selinux

[-- Attachment #1: Type: text/plain, Size: 8701 bytes --]

Hi SELinux developers,

I'm seeing "kernel: SELinux: failure in sel_netport_sid_slow(), unable to
determine network port label" under certain conditions.

OS -

Linux test33 3.10.0-327.13.1.el7.x86_64 #1 SMP Thu Mar 31 16:04:38 UTC 2016
x86_64 x86_64 x86_64 GNU/Linux
CentOS Linux release 7.2.1511 (Core)

Versions -

libselinux-2.2.2-6.el7.x86_64
libselinux-devel-2.2.2-6.el7.x86_64
libselinux-python-2.2.2-6.el7.x86_64
libselinux-utils-2.2.2-6.el7.x86_64
libsemanage-2.1.10-18.el7.x86_64
libsemanage-python-2.1.10-18.el7.x86_64
libsepol-2.1.9-3.el7.x86_64
libsepol-devel-2.1.9-3.el7.x86_64
selinux-policy-3.13.1-60.el7_2.3.noarch
selinux-policy-targeted-3.13.1-60.el7_2.3.noarch

SE Status -

SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             targeted
Current mode:                   enforcing
Mode from config file:          enforcing
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Max kernel policy version:      28

Scenario -

I run two Ansible roles, one deploys JupyterHub (a simple python+node app)
which binds to some ports, and another immediately after (same Ansible
playbook) that deploys Apache HTTPd (acting as reverse proxy for
JupyterHub).

The JupyterHub role starts the JupyterHub app (a systemd unit) as its
_last_ task, and the reverse proxy role adjusts 2 SE booleans as its
_first_ task.

What happens is the 2 SE booleans get adjusted while JupyterHub is still
starting and trying to bind to its ports, which causes the SELinux error
shown above (and below in logs).

My current workaround is simply to wait at the end of my JupyterHub role
until it has fully bound to its ports and listening (just 2-3 seconds),
then continue with the reverse proxy role, but I wonder if there's a
potential problem / race condition in SELinux?


Here's the log:

-- Last task of Ansible JupyterHub role - which is to start its service
(ignore the 'Stopped', that's due to Ansible's state=restarted, but the app
is not running at this time, and has never run on this host) --

Apr 12 15:39:53 test33 ansible-<stdin>[14326]: Invoked with name=jupyterhub
pattern=None enabled=True state=restarted sleep=None arguments=
runlevel=default
Apr 12 15:39:53 test33 systemd[1]: Reloading.
Apr 12 15:39:53 test33 systemd[1]: Configuration file
/usr/lib/systemd/system/auditd.service is marked world-inaccessible. This
has no effect as configuration data is accessible via APIs without
restrictions. Proceeding anyway.
Apr 12 15:39:53 test33 systemd[1]: Stopped JupyterHub Service.
Apr 12 15:39:53 test33 systemd[1]: Started JupyterHub Service.
Apr 12 15:39:53 test33 systemd[1]: Starting JupyterHub Service...

-- Reverse proxy role comes in, setting 'httpd_can_network_connect' and
'httpd_can_network_relay' bools, but JupyterHub startup isn't done yet --

Apr 12 15:39:53 test33 ansible-<stdin>[14367]: Invoked with warn=True
executable=None chdir=None _raw_params=setsebool -P
httpd_can_network_connect 1 && setsebool -P httpd_can_network_relay 1
removes=None creates=None _uses_shell=True
Apr 12 15:39:54 test33 dbus[537]: avc:  received policyload notice (seqno=4)
Apr 12 15:39:54 test33 dbus-daemon[537]: dbus[537]: avc:  received
policyload notice (seqno=4)
Apr 12 15:39:54 test33 dbus[537]: [system] Reloaded configuration
Apr 12 15:39:54 test33 dbus-daemon[537]: dbus[537]: [system] Reloaded
configuration
Apr 12 15:39:54 test33 kernel: SELinux: 2048 avtab hash slots, 105676 rules.
Apr 12 15:39:54 test33 kernel: SELinux: 2048 avtab hash slots, 105676 rules.
Apr 12 15:39:54 test33 kernel: SELinux:  8 users, 105 roles, 4965 types,
304 bools, 1 sens, 1024 cats
Apr 12 15:39:54 test33 kernel: SELinux:  83 classes, 105676 rules

-- JupyterHub continues with its startup, but cannot bind with an odd
memory error, and kernel reports SELinux network port label issue, which
I'm assuming is the cause of JupyterHub failing to bind --

Apr 12 15:39:54 test33 jupyterhub[14352]: [I 2016-04-12 15:39:54.340
JupyterHub app:572] Writing cookie_secret to
/etc/jupyterhub/jupyterhub_cookie_secret
Apr 12 15:39:54 test33 jupyterhub[14352]: [W 2016-04-12 15:39:54.426
JupyterHub app:292]
Apr 12 15:39:54 test33 jupyterhub[14352]: Generating
CONFIGPROXY_AUTH_TOKEN. Restarting the Hub will require restarting the
proxy.
Apr 12 15:39:54 test33 jupyterhub[14352]: Set CONFIGPROXY_AUTH_TOKEN env or
JupyterHub.proxy_auth_token config to avoid this message.
Apr 12 15:39:54 test33 jupyterhub[14352]: [W 2016-04-12 15:39:54.438
JupyterHub app:685] No admin users, admin interface will be unavailable.
Apr 12 15:39:54 test33 jupyterhub[14352]: [W 2016-04-12 15:39:54.438
JupyterHub app:686] Add any administrative users to
`c.Authenticator.admin_users` in config.
Apr 12 15:39:54 test33 jupyterhub[14352]: [I 2016-04-12 15:39:54.438
JupyterHub app:712] Not using whitelist. Any authenticated user will be
allowed.
Apr 12 15:39:54 test33 kernel: SELinux: failure in sel_netport_sid_slow(),
unable to determine network port label
Apr 12 15:39:54 test33 jupyterhub[14352]: [E 2016-04-12 15:39:54.452
JupyterHub app:1110] Failed to bind hub to http://127.0.0.1:8081/hub/
Apr 12 15:39:54 test33 jupyterhub[14352]: [E 2016-04-12 15:39:54.452
JupyterHub app:1178]
Apr 12 15:39:54 test33 jupyterhub[14352]: Traceback (most recent call last):
Apr 12 15:39:54 test33 jupyterhub[14352]: File
"/usr/local/anaconda3/lib/python3.5/site-packages/jupyterhub/app.py", line
1176, in launch_instance_async
Apr 12 15:39:54 test33 jupyterhub[14352]: yield self.start()
Apr 12 15:39:54 test33 jupyterhub[14352]: File
"/usr/local/anaconda3/lib/python3.5/site-packages/jupyterhub/app.py", line
1108, in start
Apr 12 15:39:54 test33 jupyterhub[14352]:
self.http_server.listen(self.hub_port, address=self.hub_ip)
Apr 12 15:39:54 test33 jupyterhub[14352]: File
"/usr/local/anaconda3/lib/python3.5/site-packages/tornado/tcpserver.py",
line 126, in listen
Apr 12 15:39:54 test33 jupyterhub[14352]: sockets = bind_sockets(port,
address=address)
Apr 12 15:39:54 test33 jupyterhub[14352]: File
"/usr/local/anaconda3/lib/python3.5/site-packages/tornado/netutil.py", line
196, in bind_sockets
Apr 12 15:39:54 test33 jupyterhub[14352]: sock.bind(sockaddr)
Apr 12 15:39:54 test33 jupyterhub[14352]: OSError: [Errno 12] Cannot
allocate memory
Apr 12 15:39:54 test33 systemd[1]: jupyterhub.service: main process exited,
code=exited, status=1/FAILURE
Apr 12 15:39:54 test33 systemd[1]: Unit jupyterhub.service entered failed
state.
Apr 12 15:39:54 test33 systemd[1]: jupyterhub.service failed.

-- The 2 SE bools are set, not sure why there appears to be a reload every
time and whether that's standard --

Apr 12 15:39:54 test33 dbus[537]: avc:  received policyload notice (seqno=5)
Apr 12 15:39:54 test33 dbus-daemon[537]: dbus[537]: avc:  received
policyload notice (seqno=5)
Apr 12 15:39:54 test33 dbus[537]: [system] Reloaded configuration
Apr 12 15:39:54 test33 dbus-daemon[537]: dbus[537]: [system] Reloaded
configuration
Apr 12 15:39:55 test33 setsebool[14369]: The httpd_can_network_connect
policy boolean was changed to 1 by root
Apr 12 15:39:55 test33 dbus[537]: avc:  received policyload notice (seqno=6)
Apr 12 15:39:55 test33 dbus-daemon[537]: dbus[537]: avc:  received
policyload notice (seqno=6)
Apr 12 15:39:55 test33 dbus[537]: [system] Reloaded configuration
Apr 12 15:39:55 test33 dbus-daemon[537]: dbus[537]: [system] Reloaded
configuration
Apr 12 15:39:55 test33 kernel: SELinux: 2048 avtab hash slots, 105676 rules.
Apr 12 15:39:55 test33 kernel: SELinux: 2048 avtab hash slots, 105676 rules.
Apr 12 15:39:55 test33 kernel: SELinux:  8 users, 105 roles, 4965 types,
304 bools, 1 sens, 1024 cats
Apr 12 15:39:55 test33 kernel: SELinux:  83 classes, 105676 rules
Apr 12 15:39:56 test33 dbus[537]: avc:  received policyload notice (seqno=7)
Apr 12 15:39:56 test33 dbus-daemon[537]: dbus[537]: avc:  received
policyload notice (seqno=7)
Apr 12 15:39:56 test33 dbus[537]: [system] Reloaded configuration
Apr 12 15:39:56 test33 dbus-daemon[537]: dbus[537]: [system] Reloaded
configuration
Apr 12 15:39:56 test33 setsebool[14379]: The httpd_can_network_relay policy
boolean was changed to 1 by root


As mentioned above, if I let JupyterHub fully bind to its ports before
'setsebool' is issued in the follow-up Ansible role, everything works
absolutely fine, including restarting JupyterHub, and it works after
rebooting etc, all healthy and normal and no SELinux errors.

Also, there are no policy violations of any kind when JupyterHub fails to
bind, all looks clean except the kernel message in the log an ultimate
failure to run:

# ausearch -m avc
<no matches>

Thanks,

Greg Kubok.

[-- Attachment #2: Type: text/html, Size: 10018 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Failure in sel_netport_sid_slow()
  2016-04-13 18:00 Failure in sel_netport_sid_slow() Greg
@ 2016-04-13 19:10 ` Stephen Smalley
  2016-04-13 19:31   ` Stephen Smalley
  0 siblings, 1 reply; 6+ messages in thread
From: Stephen Smalley @ 2016-04-13 19:10 UTC (permalink / raw)
  To: Greg, selinux, Paul Moore

On 04/13/2016 02:00 PM, Greg wrote:
> Hi SELinux developers,
> 
> I'm seeing "kernel: SELinux: failure in sel_netport_sid_slow(), unable
> to determine network port label" under certain conditions.
> 
> OS -
> 
> Linux test33 3.10.0-327.13.1.el7.x86_64 #1 SMP Thu Mar 31 16:04:38 UTC
> 2016 x86_64 x86_64 x86_64 GNU/Linux
> CentOS Linux release 7.2.1511 (Core)
> 
> Versions -
> 
> libselinux-2.2.2-6.el7.x86_64
> libselinux-devel-2.2.2-6.el7.x86_64
> libselinux-python-2.2.2-6.el7.x86_64
> libselinux-utils-2.2.2-6.el7.x86_64
> libsemanage-2.1.10-18.el7.x86_64
> libsemanage-python-2.1.10-18.el7.x86_64
> libsepol-2.1.9-3.el7.x86_64
> libsepol-devel-2.1.9-3.el7.x86_64
> selinux-policy-3.13.1-60.el7_2.3.noarch
> selinux-policy-targeted-3.13.1-60.el7_2.3.noarch
> 
> SE Status -
> 
> SELinux status:                 enabled
> SELinuxfs mount:                /sys/fs/selinux
> SELinux root directory:         /etc/selinux
> Loaded policy name:             targeted
> Current mode:                   enforcing
> Mode from config file:          enforcing
> Policy MLS status:              enabled
> Policy deny_unknown status:     allowed
> Max kernel policy version:      28
> 
> Scenario -
> 
> I run two Ansible roles, one deploys JupyterHub (a simple python+node
> app) which binds to some ports, and another immediately after (same
> Ansible playbook) that deploys Apache HTTPd (acting as reverse proxy for
> JupyterHub).
> 
> The JupyterHub role starts the JupyterHub app (a systemd unit) as its
> _last_ task, and the reverse proxy role adjusts 2 SE booleans as its
> _first_ task.
> 
> What happens is the 2 SE booleans get adjusted while JupyterHub is still
> starting and trying to bind to its ports, which causes the SELinux error
> shown above (and below in logs).
> 
> My current workaround is simply to wait at the end of my JupyterHub role
> until it has fully bound to its ports and listening (just 2-3 seconds),
> then continue with the reverse proxy role, but I wonder if there's a
> potential problem / race condition in SELinux?
> 
> 
> Here's the log:
> 
> -- Last task of Ansible JupyterHub role - which is to start its service
> (ignore the 'Stopped', that's due to Ansible's state=restarted, but the
> app is not running at this time, and has never run on this host) --
> 
> Apr 12 15:39:53 test33 ansible-<stdin>[14326]: Invoked with
> name=jupyterhub pattern=None enabled=True state=restarted sleep=None
> arguments= runlevel=default
> Apr 12 15:39:53 test33 systemd[1]: Reloading.
> Apr 12 15:39:53 test33 systemd[1]: Configuration file
> /usr/lib/systemd/system/auditd.service is marked world-inaccessible.
> This has no effect as configuration data is accessible via APIs without
> restrictions. Proceeding anyway.
> Apr 12 15:39:53 test33 systemd[1]: Stopped JupyterHub Service.
> Apr 12 15:39:53 test33 systemd[1]: Started JupyterHub Service.
> Apr 12 15:39:53 test33 systemd[1]: Starting JupyterHub Service...
> 
> -- Reverse proxy role comes in, setting 'httpd_can_network_connect' and
> 'httpd_can_network_relay' bools, but JupyterHub startup isn't done yet --
> 
> Apr 12 15:39:53 test33 ansible-<stdin>[14367]: Invoked with warn=True
> executable=None chdir=None _raw_params=setsebool -P
> httpd_can_network_connect 1 && setsebool -P httpd_can_network_relay 1
> removes=None creates=None _uses_shell=True
> Apr 12 15:39:54 test33 dbus[537]: avc:  received policyload notice (seqno=4)
> Apr 12 15:39:54 test33 dbus-daemon[537]: dbus[537]: avc:  received
> policyload notice (seqno=4)
> Apr 12 15:39:54 test33 dbus[537]: [system] Reloaded configuration
> Apr 12 15:39:54 test33 dbus-daemon[537]: dbus[537]: [system] Reloaded
> configuration
> Apr 12 15:39:54 test33 kernel: SELinux: 2048 avtab hash slots, 105676 rules.
> Apr 12 15:39:54 test33 kernel: SELinux: 2048 avtab hash slots, 105676 rules.
> Apr 12 15:39:54 test33 kernel: SELinux:  8 users, 105 roles, 4965 types,
> 304 bools, 1 sens, 1024 cats
> Apr 12 15:39:54 test33 kernel: SELinux:  83 classes, 105676 rules
> 
> -- JupyterHub continues with its startup, but cannot bind with an odd
> memory error, and kernel reports SELinux network port label issue, which
> I'm assuming is the cause of JupyterHub failing to bind --
> 
> Apr 12 15:39:54 test33 jupyterhub[14352]: [I 2016-04-12 15:39:54.340
> JupyterHub app:572] Writing cookie_secret to
> /etc/jupyterhub/jupyterhub_cookie_secret
> Apr 12 15:39:54 test33 jupyterhub[14352]: [W 2016-04-12 15:39:54.426
> JupyterHub app:292]
> Apr 12 15:39:54 test33 jupyterhub[14352]: Generating
> CONFIGPROXY_AUTH_TOKEN. Restarting the Hub will require restarting the
> proxy.
> Apr 12 15:39:54 test33 jupyterhub[14352]: Set CONFIGPROXY_AUTH_TOKEN env
> or JupyterHub.proxy_auth_token config to avoid this message.
> Apr 12 15:39:54 test33 jupyterhub[14352]: [W 2016-04-12 15:39:54.438
> JupyterHub app:685] No admin users, admin interface will be unavailable.
> Apr 12 15:39:54 test33 jupyterhub[14352]: [W 2016-04-12 15:39:54.438
> JupyterHub app:686] Add any administrative users to
> `c.Authenticator.admin_users` in config.
> Apr 12 15:39:54 test33 jupyterhub[14352]: [I 2016-04-12 15:39:54.438
> JupyterHub app:712] Not using whitelist. Any authenticated user will be
> allowed.
> Apr 12 15:39:54 test33 kernel: SELinux: failure in
> sel_netport_sid_slow(), unable to determine network port label
> Apr 12 15:39:54 test33 jupyterhub[14352]: [E 2016-04-12 15:39:54.452
> JupyterHub app:1110] Failed to bind hub to http://127.0.0.1:8081/hub/
> Apr 12 15:39:54 test33 jupyterhub[14352]: [E 2016-04-12 15:39:54.452
> JupyterHub app:1178]
> Apr 12 15:39:54 test33 jupyterhub[14352]: Traceback (most recent call last):
> Apr 12 15:39:54 test33 jupyterhub[14352]: File
> "/usr/local/anaconda3/lib/python3.5/site-packages/jupyterhub/app.py",
> line 1176, in launch_instance_async
> Apr 12 15:39:54 test33 jupyterhub[14352]: yield self.start()
> Apr 12 15:39:54 test33 jupyterhub[14352]: File
> "/usr/local/anaconda3/lib/python3.5/site-packages/jupyterhub/app.py",
> line 1108, in start
> Apr 12 15:39:54 test33 jupyterhub[14352]:
> self.http_server.listen(self.hub_port, address=self.hub_ip)
> Apr 12 15:39:54 test33 jupyterhub[14352]: File
> "/usr/local/anaconda3/lib/python3.5/site-packages/tornado/tcpserver.py",
> line 126, in listen
> Apr 12 15:39:54 test33 jupyterhub[14352]: sockets = bind_sockets(port,
> address=address)
> Apr 12 15:39:54 test33 jupyterhub[14352]: File
> "/usr/local/anaconda3/lib/python3.5/site-packages/tornado/netutil.py",
> line 196, in bind_sockets
> Apr 12 15:39:54 test33 jupyterhub[14352]: sock.bind(sockaddr)
> Apr 12 15:39:54 test33 jupyterhub[14352]: OSError: [Errno 12] Cannot
> allocate memory
> Apr 12 15:39:54 test33 systemd[1]: jupyterhub.service: main process
> exited, code=exited, status=1/FAILURE
> Apr 12 15:39:54 test33 systemd[1]: Unit jupyterhub.service entered
> failed state.
> Apr 12 15:39:54 test33 systemd[1]: jupyterhub.service failed.
> 
> -- The 2 SE bools are set, not sure why there appears to be a reload
> every time and whether that's standard --
> 
> Apr 12 15:39:54 test33 dbus[537]: avc:  received policyload notice (seqno=5)
> Apr 12 15:39:54 test33 dbus-daemon[537]: dbus[537]: avc:  received
> policyload notice (seqno=5)
> Apr 12 15:39:54 test33 dbus[537]: [system] Reloaded configuration
> Apr 12 15:39:54 test33 dbus-daemon[537]: dbus[537]: [system] Reloaded
> configuration
> Apr 12 15:39:55 test33 setsebool[14369]: The httpd_can_network_connect
> policy boolean was changed to 1 by root
> Apr 12 15:39:55 test33 dbus[537]: avc:  received policyload notice (seqno=6)
> Apr 12 15:39:55 test33 dbus-daemon[537]: dbus[537]: avc:  received
> policyload notice (seqno=6)
> Apr 12 15:39:55 test33 dbus[537]: [system] Reloaded configuration
> Apr 12 15:39:55 test33 dbus-daemon[537]: dbus[537]: [system] Reloaded
> configuration
> Apr 12 15:39:55 test33 kernel: SELinux: 2048 avtab hash slots, 105676 rules.
> Apr 12 15:39:55 test33 kernel: SELinux: 2048 avtab hash slots, 105676 rules.
> Apr 12 15:39:55 test33 kernel: SELinux:  8 users, 105 roles, 4965 types,
> 304 bools, 1 sens, 1024 cats
> Apr 12 15:39:55 test33 kernel: SELinux:  83 classes, 105676 rules
> Apr 12 15:39:56 test33 dbus[537]: avc:  received policyload notice (seqno=7)
> Apr 12 15:39:56 test33 dbus-daemon[537]: dbus[537]: avc:  received
> policyload notice (seqno=7)
> Apr 12 15:39:56 test33 dbus[537]: [system] Reloaded configuration
> Apr 12 15:39:56 test33 dbus-daemon[537]: dbus[537]: [system] Reloaded
> configuration
> Apr 12 15:39:56 test33 setsebool[14379]: The httpd_can_network_relay
> policy boolean was changed to 1 by root
> 
> 
> As mentioned above, if I let JupyterHub fully bind to its ports before
> 'setsebool' is issued in the follow-up Ansible role, everything works
> absolutely fine, including restarting JupyterHub, and it works after
> rebooting etc, all healthy and normal and no SELinux errors.
> 
> Also, there are no policy violations of any kind when JupyterHub fails
> to bind, all looks clean except the kernel message in the log an
> ultimate failure to run:
> 
> # ausearch -m avc
> <no matches>

Thanks for the report.  Is your system under memory pressure?
That's the only situation where I could see sel_netport_sid_slow()
failing with ENOMEM.  The policyload notice is normal; on a boolean
change, the kernel triggers the same userspace notification as for a
policy reload so that userspace will flush its AVC.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Failure in sel_netport_sid_slow()
  2016-04-13 19:10 ` Stephen Smalley
@ 2016-04-13 19:31   ` Stephen Smalley
  2016-04-13 21:36     ` Paul Moore
  0 siblings, 1 reply; 6+ messages in thread
From: Stephen Smalley @ 2016-04-13 19:31 UTC (permalink / raw)
  To: Greg, selinux, Paul Moore

On 04/13/2016 03:10 PM, Stephen Smalley wrote:
> On 04/13/2016 02:00 PM, Greg wrote:
>> Hi SELinux developers,
>>
>> I'm seeing "kernel: SELinux: failure in sel_netport_sid_slow(), unable
>> to determine network port label" under certain conditions.
>>
>> OS -
>>
>> Linux test33 3.10.0-327.13.1.el7.x86_64 #1 SMP Thu Mar 31 16:04:38 UTC
>> 2016 x86_64 x86_64 x86_64 GNU/Linux
>> CentOS Linux release 7.2.1511 (Core)
>>
>> Versions -
>>
>> libselinux-2.2.2-6.el7.x86_64
>> libselinux-devel-2.2.2-6.el7.x86_64
>> libselinux-python-2.2.2-6.el7.x86_64
>> libselinux-utils-2.2.2-6.el7.x86_64
>> libsemanage-2.1.10-18.el7.x86_64
>> libsemanage-python-2.1.10-18.el7.x86_64
>> libsepol-2.1.9-3.el7.x86_64
>> libsepol-devel-2.1.9-3.el7.x86_64
>> selinux-policy-3.13.1-60.el7_2.3.noarch
>> selinux-policy-targeted-3.13.1-60.el7_2.3.noarch
>>
>> SE Status -
>>
>> SELinux status:                 enabled
>> SELinuxfs mount:                /sys/fs/selinux
>> SELinux root directory:         /etc/selinux
>> Loaded policy name:             targeted
>> Current mode:                   enforcing
>> Mode from config file:          enforcing
>> Policy MLS status:              enabled
>> Policy deny_unknown status:     allowed
>> Max kernel policy version:      28
>>
>> Scenario -
>>
>> I run two Ansible roles, one deploys JupyterHub (a simple python+node
>> app) which binds to some ports, and another immediately after (same
>> Ansible playbook) that deploys Apache HTTPd (acting as reverse proxy for
>> JupyterHub).
>>
>> The JupyterHub role starts the JupyterHub app (a systemd unit) as its
>> _last_ task, and the reverse proxy role adjusts 2 SE booleans as its
>> _first_ task.
>>
>> What happens is the 2 SE booleans get adjusted while JupyterHub is still
>> starting and trying to bind to its ports, which causes the SELinux error
>> shown above (and below in logs).
>>
>> My current workaround is simply to wait at the end of my JupyterHub role
>> until it has fully bound to its ports and listening (just 2-3 seconds),
>> then continue with the reverse proxy role, but I wonder if there's a
>> potential problem / race condition in SELinux?
>>
>>
>> Here's the log:
>>
>> -- Last task of Ansible JupyterHub role - which is to start its service
>> (ignore the 'Stopped', that's due to Ansible's state=restarted, but the
>> app is not running at this time, and has never run on this host) --
>>
>> Apr 12 15:39:53 test33 ansible-<stdin>[14326]: Invoked with
>> name=jupyterhub pattern=None enabled=True state=restarted sleep=None
>> arguments= runlevel=default
>> Apr 12 15:39:53 test33 systemd[1]: Reloading.
>> Apr 12 15:39:53 test33 systemd[1]: Configuration file
>> /usr/lib/systemd/system/auditd.service is marked world-inaccessible.
>> This has no effect as configuration data is accessible via APIs without
>> restrictions. Proceeding anyway.
>> Apr 12 15:39:53 test33 systemd[1]: Stopped JupyterHub Service.
>> Apr 12 15:39:53 test33 systemd[1]: Started JupyterHub Service.
>> Apr 12 15:39:53 test33 systemd[1]: Starting JupyterHub Service...
>>
>> -- Reverse proxy role comes in, setting 'httpd_can_network_connect' and
>> 'httpd_can_network_relay' bools, but JupyterHub startup isn't done yet --
>>
>> Apr 12 15:39:53 test33 ansible-<stdin>[14367]: Invoked with warn=True
>> executable=None chdir=None _raw_params=setsebool -P
>> httpd_can_network_connect 1 && setsebool -P httpd_can_network_relay 1
>> removes=None creates=None _uses_shell=True
>> Apr 12 15:39:54 test33 dbus[537]: avc:  received policyload notice (seqno=4)
>> Apr 12 15:39:54 test33 dbus-daemon[537]: dbus[537]: avc:  received
>> policyload notice (seqno=4)
>> Apr 12 15:39:54 test33 dbus[537]: [system] Reloaded configuration
>> Apr 12 15:39:54 test33 dbus-daemon[537]: dbus[537]: [system] Reloaded
>> configuration
>> Apr 12 15:39:54 test33 kernel: SELinux: 2048 avtab hash slots, 105676 rules.
>> Apr 12 15:39:54 test33 kernel: SELinux: 2048 avtab hash slots, 105676 rules.
>> Apr 12 15:39:54 test33 kernel: SELinux:  8 users, 105 roles, 4965 types,
>> 304 bools, 1 sens, 1024 cats
>> Apr 12 15:39:54 test33 kernel: SELinux:  83 classes, 105676 rules
>>
>> -- JupyterHub continues with its startup, but cannot bind with an odd
>> memory error, and kernel reports SELinux network port label issue, which
>> I'm assuming is the cause of JupyterHub failing to bind --
>>
>> Apr 12 15:39:54 test33 jupyterhub[14352]: [I 2016-04-12 15:39:54.340
>> JupyterHub app:572] Writing cookie_secret to
>> /etc/jupyterhub/jupyterhub_cookie_secret
>> Apr 12 15:39:54 test33 jupyterhub[14352]: [W 2016-04-12 15:39:54.426
>> JupyterHub app:292]
>> Apr 12 15:39:54 test33 jupyterhub[14352]: Generating
>> CONFIGPROXY_AUTH_TOKEN. Restarting the Hub will require restarting the
>> proxy.
>> Apr 12 15:39:54 test33 jupyterhub[14352]: Set CONFIGPROXY_AUTH_TOKEN env
>> or JupyterHub.proxy_auth_token config to avoid this message.
>> Apr 12 15:39:54 test33 jupyterhub[14352]: [W 2016-04-12 15:39:54.438
>> JupyterHub app:685] No admin users, admin interface will be unavailable.
>> Apr 12 15:39:54 test33 jupyterhub[14352]: [W 2016-04-12 15:39:54.438
>> JupyterHub app:686] Add any administrative users to
>> `c.Authenticator.admin_users` in config.
>> Apr 12 15:39:54 test33 jupyterhub[14352]: [I 2016-04-12 15:39:54.438
>> JupyterHub app:712] Not using whitelist. Any authenticated user will be
>> allowed.
>> Apr 12 15:39:54 test33 kernel: SELinux: failure in
>> sel_netport_sid_slow(), unable to determine network port label
>> Apr 12 15:39:54 test33 jupyterhub[14352]: [E 2016-04-12 15:39:54.452
>> JupyterHub app:1110] Failed to bind hub to http://127.0.0.1:8081/hub/
>> Apr 12 15:39:54 test33 jupyterhub[14352]: [E 2016-04-12 15:39:54.452
>> JupyterHub app:1178]
>> Apr 12 15:39:54 test33 jupyterhub[14352]: Traceback (most recent call last):
>> Apr 12 15:39:54 test33 jupyterhub[14352]: File
>> "/usr/local/anaconda3/lib/python3.5/site-packages/jupyterhub/app.py",
>> line 1176, in launch_instance_async
>> Apr 12 15:39:54 test33 jupyterhub[14352]: yield self.start()
>> Apr 12 15:39:54 test33 jupyterhub[14352]: File
>> "/usr/local/anaconda3/lib/python3.5/site-packages/jupyterhub/app.py",
>> line 1108, in start
>> Apr 12 15:39:54 test33 jupyterhub[14352]:
>> self.http_server.listen(self.hub_port, address=self.hub_ip)
>> Apr 12 15:39:54 test33 jupyterhub[14352]: File
>> "/usr/local/anaconda3/lib/python3.5/site-packages/tornado/tcpserver.py",
>> line 126, in listen
>> Apr 12 15:39:54 test33 jupyterhub[14352]: sockets = bind_sockets(port,
>> address=address)
>> Apr 12 15:39:54 test33 jupyterhub[14352]: File
>> "/usr/local/anaconda3/lib/python3.5/site-packages/tornado/netutil.py",
>> line 196, in bind_sockets
>> Apr 12 15:39:54 test33 jupyterhub[14352]: sock.bind(sockaddr)
>> Apr 12 15:39:54 test33 jupyterhub[14352]: OSError: [Errno 12] Cannot
>> allocate memory
>> Apr 12 15:39:54 test33 systemd[1]: jupyterhub.service: main process
>> exited, code=exited, status=1/FAILURE
>> Apr 12 15:39:54 test33 systemd[1]: Unit jupyterhub.service entered
>> failed state.
>> Apr 12 15:39:54 test33 systemd[1]: jupyterhub.service failed.
>>
>> -- The 2 SE bools are set, not sure why there appears to be a reload
>> every time and whether that's standard --
>>
>> Apr 12 15:39:54 test33 dbus[537]: avc:  received policyload notice (seqno=5)
>> Apr 12 15:39:54 test33 dbus-daemon[537]: dbus[537]: avc:  received
>> policyload notice (seqno=5)
>> Apr 12 15:39:54 test33 dbus[537]: [system] Reloaded configuration
>> Apr 12 15:39:54 test33 dbus-daemon[537]: dbus[537]: [system] Reloaded
>> configuration
>> Apr 12 15:39:55 test33 setsebool[14369]: The httpd_can_network_connect
>> policy boolean was changed to 1 by root
>> Apr 12 15:39:55 test33 dbus[537]: avc:  received policyload notice (seqno=6)
>> Apr 12 15:39:55 test33 dbus-daemon[537]: dbus[537]: avc:  received
>> policyload notice (seqno=6)
>> Apr 12 15:39:55 test33 dbus[537]: [system] Reloaded configuration
>> Apr 12 15:39:55 test33 dbus-daemon[537]: dbus[537]: [system] Reloaded
>> configuration
>> Apr 12 15:39:55 test33 kernel: SELinux: 2048 avtab hash slots, 105676 rules.
>> Apr 12 15:39:55 test33 kernel: SELinux: 2048 avtab hash slots, 105676 rules.
>> Apr 12 15:39:55 test33 kernel: SELinux:  8 users, 105 roles, 4965 types,
>> 304 bools, 1 sens, 1024 cats
>> Apr 12 15:39:55 test33 kernel: SELinux:  83 classes, 105676 rules
>> Apr 12 15:39:56 test33 dbus[537]: avc:  received policyload notice (seqno=7)
>> Apr 12 15:39:56 test33 dbus-daemon[537]: dbus[537]: avc:  received
>> policyload notice (seqno=7)
>> Apr 12 15:39:56 test33 dbus[537]: [system] Reloaded configuration
>> Apr 12 15:39:56 test33 dbus-daemon[537]: dbus[537]: [system] Reloaded
>> configuration
>> Apr 12 15:39:56 test33 setsebool[14379]: The httpd_can_network_relay
>> policy boolean was changed to 1 by root
>>
>>
>> As mentioned above, if I let JupyterHub fully bind to its ports before
>> 'setsebool' is issued in the follow-up Ansible role, everything works
>> absolutely fine, including restarting JupyterHub, and it works after
>> rebooting etc, all healthy and normal and no SELinux errors.
>>
>> Also, there are no policy violations of any kind when JupyterHub fails
>> to bind, all looks clean except the kernel message in the log an
>> ultimate failure to run:
>>
>> # ausearch -m avc
>> <no matches>
> 
> Thanks for the report.  Is your system under memory pressure?
> That's the only situation where I could see sel_netport_sid_slow()
> failing with ENOMEM.  The policyload notice is normal; on a boolean
> change, the kernel triggers the same userspace notification as for a
> policy reload so that userspace will flush its AVC.

Looking at the sel_netport_sid_slow() code, I don't see why we need to
fail hard if the kzalloc() of the new node fails; we just won't be able
to cache the result but can still obtain and return the SID.  Paul, do
you see any reason we can't just move up the call to security_port_sid()
and only fail if that fails?  In the common case, that won't require
memory allocation at all (only needs it if it hasn't previously mapped
the resulting context to a SID).  In the same manner, we do not fail
just because we cannot add an entry to the AVC.  The same should apply
to netnode and netif caches.

That said, it seems hard to envision that kzalloc() failing and being
able to complete the bind() operation at all, but maybe it is a
transient failure.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Failure in sel_netport_sid_slow()
  2016-04-13 19:31   ` Stephen Smalley
@ 2016-04-13 21:36     ` Paul Moore
  2016-04-14  0:33       ` Greg
  0 siblings, 1 reply; 6+ messages in thread
From: Paul Moore @ 2016-04-13 21:36 UTC (permalink / raw)
  To: Stephen Smalley; +Cc: Greg, selinux

On Wed, Apr 13, 2016 at 3:31 PM, Stephen Smalley <sds@tycho.nsa.gov> wrote:
> Looking at the sel_netport_sid_slow() code, I don't see why we need to
> fail hard if the kzalloc() of the new node fails; we just won't be able
> to cache the result but can still obtain and return the SID.  Paul, do
> you see any reason we can't just move up the call to security_port_sid()
> and only fail if that fails?  In the common case, that won't require
> memory allocation at all (only needs it if it hasn't previously mapped
> the resulting context to a SID).  In the same manner, we do not fail
> just because we cannot add an entry to the AVC.  The same should apply
> to netnode and netif caches.

Sounds reasonable to me, I'll post a RFC patch in just a minute ... it
will be RFC because it will be completely untested, once I've had a
chance to test it I'll merge it.

> That said, it seems hard to envision that kzalloc() failing and being
> able to complete the bind() operation at all, but maybe it is a
> transient failure.

Agreed, although it might be that the rest of the allocations in the
network stack are done with something other than GFP_ATOMIC, not sure
off the top of my head.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Failure in sel_netport_sid_slow()
  2016-04-13 21:36     ` Paul Moore
@ 2016-04-14  0:33       ` Greg
  2016-04-14 10:07         ` Greg
  0 siblings, 1 reply; 6+ messages in thread
From: Greg @ 2016-04-14  0:33 UTC (permalink / raw)
  To: Paul Moore; +Cc: Stephen Smalley, selinux

[-- Attachment #1: Type: text/plain, Size: 3376 bytes --]

On 13 April 2016 at 22:36, Paul Moore <paul@paul-moore.com> wrote:

> On Wed, Apr 13, 2016 at 3:31 PM, Stephen Smalley <sds@tycho.nsa.gov>
> wrote:
> > Looking at the sel_netport_sid_slow() code, I don't see why we need to
> > fail hard if the kzalloc() of the new node fails; we just won't be able
> > to cache the result but can still obtain and return the SID.  Paul, do
> > you see any reason we can't just move up the call to security_port_sid()
> > and only fail if that fails?  In the common case, that won't require
> > memory allocation at all (only needs it if it hasn't previously mapped
> > the resulting context to a SID).  In the same manner, we do not fail
> > just because we cannot add an entry to the AVC.  The same should apply
> > to netnode and netif caches.
>
> Sounds reasonable to me, I'll post a RFC patch in just a minute ... it
> will be RFC because it will be completely untested, once I've had a
> chance to test it I'll merge it.
>
> > That said, it seems hard to envision that kzalloc() failing and being
> > able to complete the bind() operation at all, but maybe it is a
> > transient failure.
>
> Agreed, although it might be that the rest of the allocations in the
> network stack are done with something other than GFP_ATOMIC, not sure
> off the top of my head.


It doesn't appear to be a memory constrain. My initial report was from an
EC2 instance with 4GiB memory, a "t2.medium" (but plenty left, JupyterHub's
footprint is minimal, overall ~3.1GiB still spare after everything is
started). I also just tried on a new "m4.4xlarge" with 64GiB memory, same
problem.

It's very difficult to replicate outside of the first run of the 2 Ansible
roles on a fresh instance. On one instance, I was able to replicate it 4
more times after rebooting, by either running just a couple of last tasks
from role 1, and the first task from role 2, or by running equivalent
commands directly (*). But, I can't replicate it anymore (despite more
reboots, resetting SE bools etc), and unable to replicate on any other
identical instances, so it's 100% repeatable when I build brand new
instance and run the 2 roles for the first time (without any additional
waits), but after that, very hard to trigger.

(*) I ran something like this, where first parentheses pair corresponds
with last 2 tasks of role 1, and second pair with first task of role 2, and
I run them in parallel to simulate second role starting while first is
still finishing or at least systemd still starting the service --
"(sss_cache -E && systemctl restart sssd oddjobd; systemctl restart
jupyterhub) & (setsebool -P httpd_can_network_connect 1 && setsebool -P
httpd_can_network_relay 1)". Initially I thought SSS might have something
to do with this, but not sure anymore.

I also added "strace" to the systemd unit definition for JupyterHub and
spun new instances, but as "strace" adds quite a lot of overhead and
therefore time/delay, everything suddenly just works, so I can't witness
the ENOMEM in a trace unfortunately. For sanity I tried yet another build
with "strace" removed again, and the issue is back.

If you're changing it to soft failure, that will save the bind in user
space and it'll silently succeed I'm assuming, but based on the above, the
root cause that triggers this in the first place might be something
different than not enough system memory.

Thanks,

Greg.

[-- Attachment #2: Type: text/html, Size: 4232 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Failure in sel_netport_sid_slow()
  2016-04-14  0:33       ` Greg
@ 2016-04-14 10:07         ` Greg
  0 siblings, 0 replies; 6+ messages in thread
From: Greg @ 2016-04-14 10:07 UTC (permalink / raw)
  To: Paul Moore; +Cc: Stephen Smalley, selinux

[-- Attachment #1: Type: text/plain, Size: 4167 bytes --]

On 14 April 2016 at 01:33, Greg <gkubok@gmail.com> wrote:

> On 13 April 2016 at 22:36, Paul Moore <paul@paul-moore.com> wrote:
>
>> On Wed, Apr 13, 2016 at 3:31 PM, Stephen Smalley <sds@tycho.nsa.gov>
>> wrote:
>> > Looking at the sel_netport_sid_slow() code, I don't see why we need to
>> > fail hard if the kzalloc() of the new node fails; we just won't be able
>> > to cache the result but can still obtain and return the SID.  Paul, do
>> > you see any reason we can't just move up the call to security_port_sid()
>> > and only fail if that fails?  In the common case, that won't require
>> > memory allocation at all (only needs it if it hasn't previously mapped
>> > the resulting context to a SID).  In the same manner, we do not fail
>> > just because we cannot add an entry to the AVC.  The same should apply
>> > to netnode and netif caches.
>>
>> Sounds reasonable to me, I'll post a RFC patch in just a minute ... it
>> will be RFC because it will be completely untested, once I've had a
>> chance to test it I'll merge it.
>>
>> > That said, it seems hard to envision that kzalloc() failing and being
>> > able to complete the bind() operation at all, but maybe it is a
>> > transient failure.
>>
>> Agreed, although it might be that the rest of the allocations in the
>> network stack are done with something other than GFP_ATOMIC, not sure
>> off the top of my head.
>
>
> It doesn't appear to be a memory constrain. My initial report was from an
> EC2 instance with 4GiB memory, a "t2.medium" (but plenty left, JupyterHub's
> footprint is minimal, overall ~3.1GiB still spare after everything is
> started). I also just tried on a new "m4.4xlarge" with 64GiB memory, same
> problem.
>
> It's very difficult to replicate outside of the first run of the 2 Ansible
> roles on a fresh instance. On one instance, I was able to replicate it 4
> more times after rebooting, by either running just a couple of last tasks
> from role 1, and the first task from role 2, or by running equivalent
> commands directly (*). But, I can't replicate it anymore (despite more
> reboots, resetting SE bools etc), and unable to replicate on any other
> identical instances, so it's 100% repeatable when I build brand new
> instance and run the 2 roles for the first time (without any additional
> waits), but after that, very hard to trigger.
>
> (*) I ran something like this, where first parentheses pair corresponds
> with last 2 tasks of role 1, and second pair with first task of role 2, and
> I run them in parallel to simulate second role starting while first is
> still finishing or at least systemd still starting the service --
> "(sss_cache -E && systemctl restart sssd oddjobd; systemctl restart
> jupyterhub) & (setsebool -P httpd_can_network_connect 1 && setsebool -P
> httpd_can_network_relay 1)". Initially I thought SSS might have something
> to do with this, but not sure anymore.
>
> I also added "strace" to the systemd unit definition for JupyterHub and
> spun new instances, but as "strace" adds quite a lot of overhead and
> therefore time/delay, everything suddenly just works, so I can't witness
> the ENOMEM in a trace unfortunately. For sanity I tried yet another build
> with "strace" removed again, and the issue is back.
>
> If you're changing it to soft failure, that will save the bind in user
> space and it'll silently succeed I'm assuming, but based on the above, the
> root cause that triggers this in the first place might be something
> different than not enough system memory.
>

Just a follow up. I built 7 more instances, with "strace" in the systemd
unit. 2 ended up fine (app bound and listening, no sel_netport_sid_slow
failure), while 5 ended with the failure and app didn't bind.

I compared traces of bad and good runs and everything's the same up to the
point of bind, where it just gives ENOMEM in the failing run. So nothing
new here, but at least I can tell that user space contexts of failure and
success are pretty much identical and all hinges on the bind syscall and
kernel (and the inexplicable perceived lack of kernel memory that must be
returned by one of the k*allocs somewhere along the way).

Thanks,

Greg.

[-- Attachment #2: Type: text/html, Size: 5357 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-04-14 10:07 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-13 18:00 Failure in sel_netport_sid_slow() Greg
2016-04-13 19:10 ` Stephen Smalley
2016-04-13 19:31   ` Stephen Smalley
2016-04-13 21:36     ` Paul Moore
2016-04-14  0:33       ` Greg
2016-04-14 10:07         ` Greg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.