IIRC shoot-downs are one of the reasons for using per-cpu PGDs, which can in-turn enable/underpin other hardening functions... presuming the churn of recent years has softened attitudes toward such core MM changes. https://forum.osdev.org/viewtopic.php?f=15&t=29661 -Boris On Mon, Aug 30, 2021 at 8:02 PM Rick Edgecombe <rick.p.edgecombe@intel.com> wrote: > > Hi, > > This is a second RFC for the PKS write protected tables concept. I'm sharing to > show the progress to interested people. I'd also appreciate any comments, > especially on the direct map page table protection solution (patch 17). > > Since v1[1], the improvements are: > - Fully handle direct map page tables, and handle hotplug/unplug path. > - Create a debug time checker that scans page tables and verifies > their protection. > - Fix odds-and-ends kernel page tables that showed up with debug > checker. At this point all of the typical normal page tables should be > protected. > - Fix toggling of writablility for odds-and-ends page table modifications found > that don't use the normal helpers. > - Create atomic context grouped page allocator, after finding some page table > allocations that are passing GFP_ATOMIC. > - Create "soft" mode that warns and disables protection on violation instead > of oopsing. > - Boot parameters for disabling pks tables > - Change PageTable set clear to ctor/dtor (peterz) > - Remove VM_BUG_ON_PAGE in alloc_table() (Shakeel Butt) > - PeterZ/Vlastimil had suggested to also build a non-PKS mode for use in > debugging. I skipped it for now because the series was too big. > - Rebased to latest PKS core v7 [2] > > Also, Mike Rapoport has been experimenting[3] with this usage to work on how to > share caches of permissioned/broken pages between use cases. This RFCv2 still > uses the "grouped pages" concept, where each usage would maintain its own > cache, but should be able to integrate with a central solution if something is > developed. > > Next I was planning to look into characterizing/tuning the performance, although > what page allocation scheme is ultimately used will probably impact that. > > This applies on top of the PKS core v7 series[2] and this patch[4]. Testing is > still pretty light. > > This RFC has been acked by Dave Hansen. > > [1] https://lore.kernel.org/lkml/20210505003032.489164-1-rick.p.edgecombe@intel.com/ > [2] https://lore.kernel.org/lkml/20210804043231.2655537-1-ira.weiny@intel.com/ > [3] https://lore.kernel.org/lkml/20210823132513.15836-1-rppt@kernel.org/ > [4] https://lore.kernel.org/lkml/20210818221026.10794-1-rick.p.edgecombe@intel.com/ > > Rick Edgecombe (19): > list: Support getting most recent element in list_lru > list: Support list head not in object for list_lru > x86/mm/cpa: Add grouped page allocations > mm: Explicitly zero page table lock ptr > x86, mm: Use cache of page tables > x86/mm/cpa: Add perm callbacks to grouped pages > x86/cpufeatures: Add feature for pks tables > x86/mm/cpa: Add get_grouped_page_atomic() > x86/mm: Support GFP_ATOMIC in alloc_table_node() > x86/mm: Use alloc_table() for fill_pte(), etc > mm/sparsemem: Use alloc_table() for table allocations > x86/mm: Use free_table in unmap path > mm/debug_vm_page_table: Use setters instead of WRITE_ONCE > x86/efi: Toggle table protections when copying > x86/mm/cpa: Add set_memory_pks() > x86/mm: Protect page tables with PKS > x86/mm/cpa: PKS protect direct map page tables > x86/mm: Add PKS table soft mode > x86/mm: Add PKS table debug checking > > .../admin-guide/kernel-parameters.txt | 4 + > arch/x86/boot/compressed/ident_map_64.c | 5 + > arch/x86/include/asm/cpufeatures.h | 2 +- > arch/x86/include/asm/pgalloc.h | 6 +- > arch/x86/include/asm/pgtable.h | 31 +- > arch/x86/include/asm/pgtable_64.h | 33 +- > arch/x86/include/asm/pkeys_common.h | 1 - > arch/x86/include/asm/set_memory.h | 24 + > arch/x86/mm/init.c | 90 +++ > arch/x86/mm/init_64.c | 29 +- > arch/x86/mm/pat/set_memory.c | 527 +++++++++++++++++- > arch/x86/mm/pgtable.c | 183 +++++- > arch/x86/mm/pkeys.c | 4 + > arch/x86/platform/efi/efi_64.c | 8 + > include/asm-generic/pgalloc.h | 46 +- > include/linux/list_lru.h | 26 + > include/linux/mm.h | 16 +- > include/linux/pkeys.h | 1 + > mm/Kconfig | 23 + > mm/debug_vm_pgtable.c | 36 +- > mm/list_lru.c | 38 +- > mm/memory.c | 1 + > mm/sparse-vmemmap.c | 22 +- > mm/swap.c | 6 + > mm/swap_state.c | 5 + > .../arch/x86/include/asm/disabled-features.h | 8 +- > 26 files changed, 1123 insertions(+), 52 deletions(-) > > -- > 2.17.1 > -- Boris Lukashev Systems Architect Semper Victus
[-- Attachment #1: Type: text/plain, Size: 2104 bytes --] IIRC shoot-downs are one of the reasons for using per-cpu PGDs which would be a hard sell to some people. https://forum.osdev.org/viewtopic.php?f=15&t=29661 -Boris On Thu, Mar 14, 2024 at 2:26 PM Ira Weiny <ira.weiny@intel.com> wrote: > Edgecombe, Rick P wrote: > > On Thu, 2024-03-14 at 09:27 -0700, Kees Cook wrote: > > > On Mon, Aug 30, 2021 at 04:59:08PM -0700, Rick Edgecombe wrote: > > > > This is a second RFC for the PKS write protected tables concept. > > > > I'm sharing to > > > > show the progress to interested people. I'd also appreciate any > > > > comments, > > > > especially on the direct map page table protection solution (patch > > > > 17). > > > > > > *thread necromancy* > > > > > > Hi, > > > > > > Where does this series stand? I don't think it ever got merged? > > > > There are sort of three components to this: > > 1. Basic PKS support. It was dropped after the main use case was > > rejected (pmem stray write protection). > > This was the main reason it got dropped. > > > 2. Solution for applying direct map permissions efficiently. This > > includes avoiding excessive kernel shootdowns, as well as avoiding > > direct map fragmentation. rppt continued to look at the fragmentation > > part of the problem and ended up arguing that it actually isn't an > > issue [0]. Regardless, the shootdown problem remains for usages like > > PKS tables that allocate so frequently. There is an attempt to address > > both in this series. But given the above, there may be lots of debate > > and opinions. > > 3. The actual protection of the PKS tables (most of this series). It > > got paused when I started to work on CET. In the meantime 1 was > > dropped, and 2 is still open(?). So there is more to work through now, > > then when it was dropped. > > > > If anyone wants to pick it up, it is fine by me. I can help with > > reviews. > > I can help with reviews as well, > Ira > > > > > > > [0] https://lwn.net/Articles/931406/ > > > -- Boris Lukashev Systems Architect Semper Victus <https://www.sempervictus.com> [-- Attachment #2: Type: text/html, Size: 3088 bytes --]
Edgecombe, Rick P wrote: > On Thu, 2024-03-14 at 09:27 -0700, Kees Cook wrote: > > On Mon, Aug 30, 2021 at 04:59:08PM -0700, Rick Edgecombe wrote: > > > This is a second RFC for the PKS write protected tables concept. > > > I'm sharing to > > > show the progress to interested people. I'd also appreciate any > > > comments, > > > especially on the direct map page table protection solution (patch > > > 17). > > > > *thread necromancy* > > > > Hi, > > > > Where does this series stand? I don't think it ever got merged? > > There are sort of three components to this: > 1. Basic PKS support. It was dropped after the main use case was > rejected (pmem stray write protection). This was the main reason it got dropped. > 2. Solution for applying direct map permissions efficiently. This > includes avoiding excessive kernel shootdowns, as well as avoiding > direct map fragmentation. rppt continued to look at the fragmentation > part of the problem and ended up arguing that it actually isn't an > issue [0]. Regardless, the shootdown problem remains for usages like > PKS tables that allocate so frequently. There is an attempt to address > both in this series. But given the above, there may be lots of debate > and opinions. > 3. The actual protection of the PKS tables (most of this series). It > got paused when I started to work on CET. In the meantime 1 was > dropped, and 2 is still open(?). So there is more to work through now, > then when it was dropped. > > If anyone wants to pick it up, it is fine by me. I can help with > reviews. I can help with reviews as well, Ira > > > [0] https://lwn.net/Articles/931406/
On Thu, 2024-03-14 at 09:27 -0700, Kees Cook wrote: > On Mon, Aug 30, 2021 at 04:59:08PM -0700, Rick Edgecombe wrote: > > This is a second RFC for the PKS write protected tables concept. > > I'm sharing to > > show the progress to interested people. I'd also appreciate any > > comments, > > especially on the direct map page table protection solution (patch > > 17). > > *thread necromancy* > > Hi, > > Where does this series stand? I don't think it ever got merged? There are sort of three components to this: 1. Basic PKS support. It was dropped after the main use case was rejected (pmem stray write protection). 2. Solution for applying direct map permissions efficiently. This includes avoiding excessive kernel shootdowns, as well as avoiding direct map fragmentation. rppt continued to look at the fragmentation part of the problem and ended up arguing that it actually isn't an issue [0]. Regardless, the shootdown problem remains for usages like PKS tables that allocate so frequently. There is an attempt to address both in this series. But given the above, there may be lots of debate and opinions. 3. The actual protection of the PKS tables (most of this series). It got paused when I started to work on CET. In the meantime 1 was dropped, and 2 is still open(?). So there is more to work through now, then when it was dropped. If anyone wants to pick it up, it is fine by me. I can help with reviews. [0] https://lwn.net/Articles/931406/
On Mon, Aug 30, 2021 at 04:59:08PM -0700, Rick Edgecombe wrote:
> This is a second RFC for the PKS write protected tables concept. I'm sharing to
> show the progress to interested people. I'd also appreciate any comments,
> especially on the direct map page table protection solution (patch 17).
*thread necromancy*
Hi,
Where does this series stand? I don't think it ever got merged?
-Kees
--
Kees Cook
============================================================================== ANNOUNCEMENT AND CALL FOR PARTICIPATION LINUX SECURITY SUMMIT EUROPE 2024 September 16-17 Vienna, Austria ============================================================================== DESCRIPTION Linux Security Summit Europe 2024 is a technical forum for collaboration between Linux developers, researchers, and end-users. Its primary aim is to foster community efforts in deeply analyzing and solving Linux operating system security challenges, including those in the Linux kernel. Presentations are expected to focus deeply on new or improved technology and how it advances the state of practice for addressing these challenges. The program committee currently seeks proposals for: * Refereed Presentations: 45 minutes in length. * Panel Discussion Topics: 45 minutes in length. * Short Topics: 30 minutes in total, including at least 10 minutes discussion. * Tutorials 90 minutes in length. Tutorial sessions should be focused on advanced Linux security defense topics within areas such as the kernel, compiler, and security-related libraries. Priority will be given to tutorials created for this conference, and those where the presenter is a leading subject matter expert on the topic. Topic areas include, but are not limited to: * Access Control * Case Studies * Cryptography and Key Management * Emerging Technologies, Threats & Techniques * Hardware Security * IoT and Embedded Security * Integrity Policy and Enforcement * Open Source Supply Chain for the Linux OS * Security Tools * Security UX * Linux OS Hardening * Virtualization and Containers Proposals should be submitted via: https://events.linuxfoundation.org/linux-security-summit-europe/program/cfp/ LSS-EU DATES * CFP close: May 19, 2024 * CFP notifications: Jun 10, 2024 * Schedule announced: Jun 12, 2024 * Event: Sep 16-17, 2024 WHO SHOULD ATTEND We're seeking a diverse range of attendees and welcome participation by people involved in Linux security development, operations, and research. LSS is a unique global event that provides the opportunity to present and discuss your work or research with key Linux security community members and maintainers. It's also useful for those who wish to keep up with the latest in Linux security development and to provide input to the development process. WEB SITE https://events.linuxfoundation.org/linux-security-summit-europe/ MASTODON For event updates and announcements, follow: https://social.kernel.org/LinuxSecSummit #linuxsecuritysummit PROGRAM COMMITTEE The program committee for LSS EU 2024 is: * Elena Reshetova, Intel * James Morris, Microsoft * Serge Hallyn, Cisco * Paul Moore, Microsoft * Stephen Smalley, NSA * John Johansen, Canonical * Kees Cook, Google * Casey Schaufler * Mimi Zohar, IBM * David A. Wheeler, Linux Foundation The program committee may be contacted as a group via email: lss-pc () lists.linuxfoundation.org
============================================================================== ANNOUNCEMENT AND CALL FOR PARTICIPATION LINUX SECURITY SUMMIT NORTH AMERICA 2024 April 18-19 Seattle, WA, USA ============================================================================== DESCRIPTION Linux Security Summit North America 2024 is a technical forum for collaboration between Linux developers, researchers, and end-users. Its primary aim is to foster community efforts in deeply analyzing and solving Linux operating system security challenges, including those in the Linux kernel. Presentations are expected to focus deeply on new or improved technology and how it advances the state of practice for addressing these challenges. The program committee currently seeks proposals for: * Refereed Presentations: 45 minutes in length. * Panel Discussion Topics: 45 minutes in length. * Short Topics: 30 minutes in total, including at least 10 minutes discussion. * Tutorials 90 minutes in length. Tutorial sessions should be focused on advanced Linux security defense topics within areas such as the kernel, compiler, and security-related libraries. Priority will be given to tutorials created for this conference, and those where the presenter is a leading subject matter expert on the topic. Topic areas include, but are not limited to: * Access Control * Case Studies * Cryptography and Key Management * Emerging Technologies, Threats & Techniques * Hardware Security * IoT and Embedded Security * Integrity Policy and Enforcement * Open Source Supply Chain for the Linux OS * Security Tools * Security UX * Linux OS Hardening * Virtualization and Containers Proposals should be submitted via: https://events.linuxfoundation.org/linux-security-summit-north-america/ LSS-NA DATES * CFP close: Jan 21, 2024 * CFP notifications: Feb 06, 2024 * Schedule announced: Feb 08, 2024 * Event: Apr 18-19, 2024 WHO SHOULD ATTEND We're seeking a diverse range of attendees and welcome participation by people involved in Linux security development, operations, and research. LSS is a unique global event that provides the opportunity to present and discuss your work or research with key Linux security community members and maintainers. It's also useful for those who wish to keep up with the latest in Linux security development and to provide input to the development process. WEB SITE https://events.linuxfoundation.org/linux-security-summit-north-america/ MASTODON For event updates and announcements, follow: https://social.kernel.org/LinuxSecSummit #linuxsecuritysummit PROGRAM COMMITTEE The program committee for LSS 2024 is: * James Morris, Microsoft * Serge Hallyn, Cisco * Paul Moore, Microsoft * Stephen Smalley, NSA * Elena Reshetova, Intel * John Johansen, Canonical * Kees Cook, Google * Casey Schaufler * Mimi Zohar, IBM * David A. Wheeler, Linux Foundation The program committee may be contacted as a group via email: lss-pc () lists.linuxfoundation.org
On Wed, Nov 01, 2023 at 05:23:12PM +0100, Jann Horn wrote: > On Wed, Nov 1, 2023 at 11:57 AM Mickaël Salaün <mic@digikod.net> wrote: > > On Tue, Oct 31, 2023 at 09:40:59PM +0100, Stefan Bavendiek wrote: > > > On Tue, Oct 24, 2023 at 11:07:14AM -0500, Serge E. Hallyn wrote: > > > > In 2005, before namespaces were upstreamed, I posted the 'bsdjail' LSM, > > > > which briefly made it into the -mm kernel, but was eventually rejected as > > > > being an abuse of the LSM interface for OS level virtualization :) > > > > > > > > It's not 100% clear to me whether Stefan only wants isolation, or > > > > wants something closer to virtualization. > > > > > > > > Stefan, would an LSM allowing you to isolate certain processes from > > > > some abstract unix socket paths (or by label, whatever0 suffice for you? > > > > > > > > > > My intention was to find a clean way to isolate abstract sockets in network > > > applications without adding dependencies like LSMs. However the entire approach > > > of using namespaces for this is something I have mostly abandoned. LSMs like > > > Apparmor and SELinux would work fine for process isolation when you can control > > > the target system, but for general deployment of sandboxed processes, I found it > > > to be significantly easier (and more effective) to build this into the > > > application itself by using a multi process approach with seccomp (Basically how > > > OpenSSH did it) > > > > I agree that for sandbox use cases embedding such security policy into > > the application itself makes sense. Landlock works the same way as > > seccomp but it sandboxes applications according to the kernel semantic > > (e.g. process, socket). The LSM framework is just a kernel > > implementation detail. ;) > > (Related, it might be nice if Landlock had a way to completely deny > access to abstract unix sockets, I think it would make more sense to scope access to abstract unix sockets: https://lore.kernel.org/all/20231025.eecai4uGh5Ie@digikod.net/ A complementary approach would be to restrict socket creation according to their properties: https://lore.kernel.org/all/b8a2045a-e7e8-d141-7c01-bf47874c7930@digikod.net/ > and a way to restrict filesystem unix > sockets with filesystem rules... LANDLOCK_ACCESS_FS_MAKE_SOCK exists > for restricting bind(), but I don't think there's an analogous > permission for connect(). I agree. It should not be too difficult to add a new LSM path hook for connect (and sendmsg) to named unix socket with the related access rights. We should be careful about the impact on sendmsg calls though. > > Currently, when you try to sandbox an application with Landlock, you > have to use seccomp to completely block access to unix domain sockets, > or alternatively use something like the seccomp_unotify feature to > interactively filter connect() calls. > > On the other hand, maybe such a feature would be a bit superfluous > when we have seccomp_unotify already... idk.) seccomp_unotify enables user space to emulate syscalls, which requires a service per sandbox. seccomp is useful but will always be delicate to use and to maintain the related filters for sandboxing use cases: https://www.ndss-symposium.org/ndss2003/traps-and-pitfalls-practical-problems-system-call-interposition-based-security-tools/ Anyway, I'd be happy to help improve Landlock with new access control types. FYI, TCP connect and bind access control should be part of Linux 6.7: https://lore.kernel.org/all/20231102131354.263678-1-mic@digikod.net/
On Wed, Nov 1, 2023 at 11:57 AM Mickaël Salaün <mic@digikod.net> wrote:
> On Tue, Oct 31, 2023 at 09:40:59PM +0100, Stefan Bavendiek wrote:
> > On Tue, Oct 24, 2023 at 11:07:14AM -0500, Serge E. Hallyn wrote:
> > > In 2005, before namespaces were upstreamed, I posted the 'bsdjail' LSM,
> > > which briefly made it into the -mm kernel, but was eventually rejected as
> > > being an abuse of the LSM interface for OS level virtualization :)
> > >
> > > It's not 100% clear to me whether Stefan only wants isolation, or
> > > wants something closer to virtualization.
> > >
> > > Stefan, would an LSM allowing you to isolate certain processes from
> > > some abstract unix socket paths (or by label, whatever0 suffice for you?
> > >
> >
> > My intention was to find a clean way to isolate abstract sockets in network
> > applications without adding dependencies like LSMs. However the entire approach
> > of using namespaces for this is something I have mostly abandoned. LSMs like
> > Apparmor and SELinux would work fine for process isolation when you can control
> > the target system, but for general deployment of sandboxed processes, I found it
> > to be significantly easier (and more effective) to build this into the
> > application itself by using a multi process approach with seccomp (Basically how
> > OpenSSH did it)
>
> I agree that for sandbox use cases embedding such security policy into
> the application itself makes sense. Landlock works the same way as
> seccomp but it sandboxes applications according to the kernel semantic
> (e.g. process, socket). The LSM framework is just a kernel
> implementation detail. ;)
(Related, it might be nice if Landlock had a way to completely deny
access to abstract unix sockets, and a way to restrict filesystem unix
sockets with filesystem rules... LANDLOCK_ACCESS_FS_MAKE_SOCK exists
for restricting bind(), but I don't think there's an analogous
permission for connect().
Currently, when you try to sandbox an application with Landlock, you
have to use seccomp to completely block access to unix domain sockets,
or alternatively use something like the seccomp_unotify feature to
interactively filter connect() calls.
On the other hand, maybe such a feature would be a bit superfluous
when we have seccomp_unotify already... idk.)
On Tue, Oct 31, 2023 at 09:40:59PM +0100, Stefan Bavendiek wrote:
> On Tue, Oct 24, 2023 at 11:07:14AM -0500, Serge E. Hallyn wrote:
> > In 2005, before namespaces were upstreamed, I posted the 'bsdjail' LSM,
> > which briefly made it into the -mm kernel, but was eventually rejected as
> > being an abuse of the LSM interface for OS level virtualization :)
> >
> > It's not 100% clear to me whether Stefan only wants isolation, or
> > wants something closer to virtualization.
> >
> > Stefan, would an LSM allowing you to isolate certain processes from
> > some abstract unix socket paths (or by label, whatever0 suffice for you?
> >
>
> My intention was to find a clean way to isolate abstract sockets in network
> applications without adding dependencies like LSMs. However the entire approach
> of using namespaces for this is something I have mostly abandoned. LSMs like
> Apparmor and SELinux would work fine for process isolation when you can control
> the target system, but for general deployment of sandboxed processes, I found it
> to be significantly easier (and more effective) to build this into the
> application itself by using a multi process approach with seccomp (Basically how
> OpenSSH did it)
I agree that for sandbox use cases embedding such security policy into
the application itself makes sense. Landlock works the same way as
seccomp but it sandboxes applications according to the kernel semantic
(e.g. process, socket). The LSM framework is just a kernel
implementation detail. ;)
On Tue, Oct 24, 2023 at 11:07:14AM -0500, Serge E. Hallyn wrote:
> In 2005, before namespaces were upstreamed, I posted the 'bsdjail' LSM,
> which briefly made it into the -mm kernel, but was eventually rejected as
> being an abuse of the LSM interface for OS level virtualization :)
>
> It's not 100% clear to me whether Stefan only wants isolation, or
> wants something closer to virtualization.
>
> Stefan, would an LSM allowing you to isolate certain processes from
> some abstract unix socket paths (or by label, whatever0 suffice for you?
>
My intention was to find a clean way to isolate abstract sockets in network
applications without adding dependencies like LSMs. However the entire approach
of using namespaces for this is something I have mostly abandoned. LSMs like
Apparmor and SELinux would work fine for process isolation when you can control
the target system, but for general deployment of sandboxed processes, I found it
to be significantly easier (and more effective) to build this into the
application itself by using a multi process approach with seccomp (Basically how
OpenSSH did it)
- Stefan
On Wed, Oct 25, 2023 at 7:22 PM Serge E. Hallyn <serge@hallyn.com> wrote: > > On Wed, Oct 25, 2023 at 07:10:07PM +0200, Jann Horn wrote: > > On Tue, Oct 24, 2023 at 3:46 PM Serge E. Hallyn <serge@hallyn.com> wrote: > > > Disabling them altogether would break lots of things depending on them, > > > like X :) (@/tmp/.X11-unix/X0). > > > > FWIW, X can connect over both filesystem-based unix domain sockets and > > abstract unix domain sockets. When a normal X client tries to connect > > to the server, it'll try a bunch of stuff, including an abstract unix > > socket address, a filesystem-based unix socket address, and TCP: > > > > $ DISPLAY=:12345 strace -f -e trace=connect xev >/dev/null > > connect(3, {sa_family=AF_UNIX, sun_path=@"/tmp/.X11-unix/X12345"}, 24) > > = -1 ECONNREFUSED (Connection refused) > > connect(3, {sa_family=AF_UNIX, sun_path="/tmp/.X11-unix/X12345"}, 110) > > = -1 ENOENT (No such file or directory) > > [...] > > connect(3, {sa_family=AF_INET, sin_port=htons(18345), > > sin_addr=inet_addr("127.0.0.1")}, 16) = 0 > > connect(3, {sa_family=AF_INET6, sin6_port=htons(18345), > > inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=htonl(0), > > sin6_scope_id=0}, 28) = 0 > > connect(3, {sa_family=AF_INET6, sin6_port=htons(18345), > > inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=htonl(0), > > sin6_scope_id=0}, 28) = -1 ECONNREFUSED (Connection refused) > > connect(3, {sa_family=AF_INET, sin_port=htons(18345), > > sin_addr=inet_addr("127.0.0.1")}, 16) = -1 ECONNREFUSED (Connection > > refused) > > > > And the X server normally listens on both an abstract and a > > filesystem-based unix socket address (see "netstat --unix -lnp"). > > > > So rejecting abstract unix socket connections shouldn't prevent an X > > client from connecting to the X server, I think. > > Well it was just an example :) Dbus is another. But maybe all > the users of abstract unix sockets will fall back gracefully to > something else. That'd be nice. For what it's worth, when I try to connect to the session or system bus on my system (like "strace -f -e trace=connect dbus-send --session/--system /foo foo"), the connections seem to go directly to a filesystem socket... > For X, abstract really doesn't even make sense to me. Has it always > supported that? No idea.
On Wed, Oct 25, 2023 at 07:10:07PM +0200, Jann Horn wrote:
> On Tue, Oct 24, 2023 at 3:46 PM Serge E. Hallyn <serge@hallyn.com> wrote:
> > Disabling them altogether would break lots of things depending on them,
> > like X :) (@/tmp/.X11-unix/X0).
>
> FWIW, X can connect over both filesystem-based unix domain sockets and
> abstract unix domain sockets. When a normal X client tries to connect
> to the server, it'll try a bunch of stuff, including an abstract unix
> socket address, a filesystem-based unix socket address, and TCP:
>
> $ DISPLAY=:12345 strace -f -e trace=connect xev >/dev/null
> connect(3, {sa_family=AF_UNIX, sun_path=@"/tmp/.X11-unix/X12345"}, 24)
> = -1 ECONNREFUSED (Connection refused)
> connect(3, {sa_family=AF_UNIX, sun_path="/tmp/.X11-unix/X12345"}, 110)
> = -1 ENOENT (No such file or directory)
> [...]
> connect(3, {sa_family=AF_INET, sin_port=htons(18345),
> sin_addr=inet_addr("127.0.0.1")}, 16) = 0
> connect(3, {sa_family=AF_INET6, sin6_port=htons(18345),
> inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=htonl(0),
> sin6_scope_id=0}, 28) = 0
> connect(3, {sa_family=AF_INET6, sin6_port=htons(18345),
> inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=htonl(0),
> sin6_scope_id=0}, 28) = -1 ECONNREFUSED (Connection refused)
> connect(3, {sa_family=AF_INET, sin_port=htons(18345),
> sin_addr=inet_addr("127.0.0.1")}, 16) = -1 ECONNREFUSED (Connection
> refused)
>
> And the X server normally listens on both an abstract and a
> filesystem-based unix socket address (see "netstat --unix -lnp").
>
> So rejecting abstract unix socket connections shouldn't prevent an X
> client from connecting to the X server, I think.
Well it was just an example :) Dbus is another. But maybe all
the users of abstract unix sockets will fall back gracefully to
something else. That'd be nice.
For X, abstract really doesn't even make sense to me. Has it always
supported that?
On Tue, Oct 24, 2023 at 3:46 PM Serge E. Hallyn <serge@hallyn.com> wrote:
> Disabling them altogether would break lots of things depending on them,
> like X :) (@/tmp/.X11-unix/X0).
FWIW, X can connect over both filesystem-based unix domain sockets and
abstract unix domain sockets. When a normal X client tries to connect
to the server, it'll try a bunch of stuff, including an abstract unix
socket address, a filesystem-based unix socket address, and TCP:
$ DISPLAY=:12345 strace -f -e trace=connect xev >/dev/null
connect(3, {sa_family=AF_UNIX, sun_path=@"/tmp/.X11-unix/X12345"}, 24)
= -1 ECONNREFUSED (Connection refused)
connect(3, {sa_family=AF_UNIX, sun_path="/tmp/.X11-unix/X12345"}, 110)
= -1 ENOENT (No such file or directory)
[...]
connect(3, {sa_family=AF_INET, sin_port=htons(18345),
sin_addr=inet_addr("127.0.0.1")}, 16) = 0
connect(3, {sa_family=AF_INET6, sin6_port=htons(18345),
inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=htonl(0),
sin6_scope_id=0}, 28) = 0
connect(3, {sa_family=AF_INET6, sin6_port=htons(18345),
inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=htonl(0),
sin6_scope_id=0}, 28) = -1 ECONNREFUSED (Connection refused)
connect(3, {sa_family=AF_INET, sin_port=htons(18345),
sin_addr=inet_addr("127.0.0.1")}, 16) = -1 ECONNREFUSED (Connection
refused)
And the X server normally listens on both an abstract and a
filesystem-based unix socket address (see "netstat --unix -lnp").
So rejecting abstract unix socket connections shouldn't prevent an X
client from connecting to the X server, I think.
On Tue, Oct 24, 2023 at 11:07:14AM -0500, Serge E. Hallyn wrote: > On Tue, Oct 24, 2023 at 10:29:17AM -0400, Paul Moore wrote: > > On Tue, Oct 24, 2023 at 10:18 AM Serge E. Hallyn <serge@hallyn.com> wrote: > > > On Tue, Oct 24, 2023 at 10:14:29AM -0400, Paul Moore wrote: > > > > On Tue, Oct 24, 2023 at 9:46 AM Serge E. Hallyn <serge@hallyn.com> wrote: > > > > > On Sun, Dec 18, 2022 at 08:29:10PM +0100, Stefan Bavendiek wrote: > > > > > > When building userspace application sandboxes, one issue that does not seem trivial to solve is the isolation of abstract sockets. > > > > > > > > > > Veeery late reply. Have you had any productive discussions about this in > > > > > other threads or venues? > > > > > > > > > > > While most IPC mechanism can be isolated by mechanisms like mount namespaces, abstract sockets are part of the network namespace. > > > > > > It is possible to isolate abstract sockets by using a new network namespace, however, unprivileged processes can only create a new empty network namespace, which removes network access as well and makes this useless for network clients. > > > > > > > > > > > > Same linux sandbox projects try to solve this by bridging the existing network interfaces into the new namespace or use something like slirp4netns to archive this, but this does not look like an ideal solution to this problem, especially since sandboxing should reduce the kernel attack surface without introducing more complexity. > > > > > > > > > > > > Aside from containers using namespaces, sandbox implementations based on seccomp and landlock would also run into the same problem, since landlock only provides file system isolation and seccomp cannot filter the path argument and therefore it can only be used to block new unix domain socket connections completely. > > > > > > > > > > > > Currently there does not seem to be any way to disable network namespaces in the kernel without also disabling unix domain sockets. > > > > > > > > > > > > The question is how to solve the issue of abstract socket isolation in a clean and efficient way, possibly even without namespaces. > > > > > > What would be the ideal way to implement a mechanism to disable abstract sockets either globally or even better, in the context of a process. > > > > > > And would such a patch have a realistic chance to make it into the kernel? > > > > > > > > > > Disabling them altogether would break lots of things depending on them, > > > > > like X :) (@/tmp/.X11-unix/X0). The other path is to reconsider network > > > > > namespaces. There are several directions this could lead. For one, as > > > > > Dinesh Subhraveti often points out, the current "network" namespace is > > > > > really a network device namespace. If we instead namespace at the > > > > > bind/connect/etc calls, we end up with much different abilities. > > > > > > > > The LSM layer supports access controls on abstract sockets, with at > > > > least two (AppArmor, SELinux) providing abstract socket access > > > > controls, other LSMs may provide controls as well. > > > > > > Good point. And for Stefan that may suffice, so thanks for mentioning > > > that. But The LSM layer is mandatory access control for use by the > > > admins. That doesn't help an unprivileged user. > > > > Individual LSMs may implement mandatory access control models, but > > that is not an inherent requirement imposed by the LSM layer. While > > the Landlock LSM does not (yet?) support access controls for abstract > > sockets, it is a discretionary access control mechanism. A recent discussion focused on this topic: https://lore.kernel.org/all/20231023.ahphah4Wii4v@digikod.net/ I'd like Landlock to be able to scope the use of unix sockets according to a Landlock domain the same way it is done for ptrace. This would make it possible to easily isolate unix sockets to a sandbox even by unprivileged processes (without any namespace change). I'd be happy to help implement such mechanism. > > In 2005, before namespaces were upstreamed, I posted the 'bsdjail' LSM, > which briefly made it into the -mm kernel, but was eventually rejected as > being an abuse of the LSM interface for OS level virtualization :) > > It's not 100% clear to me whether Stefan only wants isolation, or > wants something closer to virtualization. > > Stefan, would an LSM allowing you to isolate certain processes from > some abstract unix socket paths (or by label, whatever0 suffice for you? > > > I'm not currently aware of a discretionary access control LSM that > > supports abstract socket access control, but such a LSM should be > > possible if someone wanted to implement one. > > > > -- > > paul-moore.com
Yeah, I think I've heard the term "socket namespaces" before, and I
agree that changing the term 'network namespaces' in the kernel would
probably not be practical at this point.
On Tue, Oct 24, 2023 at 11:55:43AM -0400, Boris Lukashev wrote:
> Good point: from the "resources granted to a user" perspective, that does
> help bound their consumption. The nomenclature distinction seems like a
> good one to have, but if "network namespaces" *change the meaning of the
> term *and the original definition becomes "network device namespaces," then
> there would be a period where older and newer kernels have very different
> functions mapped to the same conceptual name. Might this make a bit more
> sense as "network namespaces" meaning what they do now - "network device
> namespaces," effectively; while the new concept would be "socket
> namespaces" to account for the various socket style interfaces provided?
>
> Thanks
> -Boris
>
> On Tue, Oct 24, 2023 at 10:15 AM Serge E. Hallyn <serge@hallyn.com> wrote:
>
> > Thanks for the reply. Do you have any papers which came out of this r&d
> > phase? Sounds very interesting.
> >
> > > Multiple NS' sharing an IP stack would exhaust ephemeral ranges faster
> >
> > Yes, but that could be a feature. I think of it as: I'm unprivileged
> > user serge, and I want to fire off firefox in a whatzit-namespace so
> > that I can redirect or forbid some connections. In this case, the
> > admins have not agreed to let me double my resource usage, so the fact
> > that the new namespace is sharing mine is a feature. And this lets
> > me use network-namespace-like features completely unprivileged, without
> > having to use a setuid-root helper to hook up a bridge.
> >
> > But, I didn't send this reply to advocate this approach. My main point
> > was to mention that "network namespaces are network device namespaces"
> > and hope that others would bring other suggestions for alternatives.
> >
> > -serge
> >
> > On Tue, Oct 24, 2023 at 10:05:29AM -0400, Boris Lukashev wrote:
> > > Namespacing at OSI4 seems a bit fraught as the underlying route, mac,
> > and physdev fall outside the callers control. Multiple NS' sharing an IP
> > stack would exhaust ephemeral ranges faster (likely asymmetrically too) and
> > have bound socket collisions opaque to each other requiring handling
> > outside the NS/containers purview. We looked at this sort of thing during
> > the r&d phase of our assured comms work (namespaces were young) and found a
> > bunch of overhead and collision concerns. Not saying it can't be done, but
> > getting consumers to play nice enough with such an approach may be a heavy
> > lift.
> > >
> > > Thanks,
> > > -Boris
> > >
> > >
> > > On October 24, 2023 9:46:08 AM EDT, "Serge E. Hallyn" <serge@hallyn.com>
> > wrote:
> > > >On Sun, Dec 18, 2022 at 08:29:10PM +0100, Stefan Bavendiek wrote:
> > > >> When building userspace application sandboxes, one issue that does
> > not seem trivial to solve is the isolation of abstract sockets.
> > > >
> > > >Veeery late reply. Have you had any productive discussions about this
> > in
> > > >other threads or venues?
> > > >
> > > >> While most IPC mechanism can be isolated by mechanisms like mount
> > namespaces, abstract sockets are part of the network namespace.
> > > >> It is possible to isolate abstract sockets by using a new network
> > namespace, however, unprivileged processes can only create a new empty
> > network namespace, which removes network access as well and makes this
> > useless for network clients.
> > > >>
> > > >> Same linux sandbox projects try to solve this by bridging the
> > existing network interfaces into the new namespace or use something like
> > slirp4netns to archive this, but this does not look like an ideal solution
> > to this problem, especially since sandboxing should reduce the kernel
> > attack surface without introducing more complexity.
> > > >>
> > > >> Aside from containers using namespaces, sandbox implementations based
> > on seccomp and landlock would also run into the same problem, since
> > landlock only provides file system isolation and seccomp cannot filter the
> > path argument and therefore it can only be used to block new unix domain
> > socket connections completely.
> > > >>
> > > >> Currently there does not seem to be any way to disable network
> > namespaces in the kernel without also disabling unix domain sockets.
> > > >>
> > > >> The question is how to solve the issue of abstract socket isolation
> > in a clean and efficient way, possibly even without namespaces.
> > > >> What would be the ideal way to implement a mechanism to disable
> > abstract sockets either globally or even better, in the context of a
> > process.
> > > >> And would such a patch have a realistic chance to make it into the
> > kernel?
> > > >
> > > >Disabling them altogether would break lots of things depending on them,
> > > >like X :) (@/tmp/.X11-unix/X0). The other path is to reconsider
> > network
> > > >namespaces. There are several directions this could lead. For one, as
> > > >Dinesh Subhraveti often points out, the current "network" namespace is
> > > >really a network device namespace. If we instead namespace at the
> > > >bind/connect/etc calls, we end up with much different abilities. You
> > > >can implement something like this today using seccomp-filter.
> > > >
> > > >-serge
> >
>
>
> --
> Boris Lukashev
> Systems Architect
> Semper Victus <https://www.sempervictus.com>
On Tue, Oct 24, 2023 at 10:29:17AM -0400, Paul Moore wrote: > On Tue, Oct 24, 2023 at 10:18 AM Serge E. Hallyn <serge@hallyn.com> wrote: > > On Tue, Oct 24, 2023 at 10:14:29AM -0400, Paul Moore wrote: > > > On Tue, Oct 24, 2023 at 9:46 AM Serge E. Hallyn <serge@hallyn.com> wrote: > > > > On Sun, Dec 18, 2022 at 08:29:10PM +0100, Stefan Bavendiek wrote: > > > > > When building userspace application sandboxes, one issue that does not seem trivial to solve is the isolation of abstract sockets. > > > > > > > > Veeery late reply. Have you had any productive discussions about this in > > > > other threads or venues? > > > > > > > > > While most IPC mechanism can be isolated by mechanisms like mount namespaces, abstract sockets are part of the network namespace. > > > > > It is possible to isolate abstract sockets by using a new network namespace, however, unprivileged processes can only create a new empty network namespace, which removes network access as well and makes this useless for network clients. > > > > > > > > > > Same linux sandbox projects try to solve this by bridging the existing network interfaces into the new namespace or use something like slirp4netns to archive this, but this does not look like an ideal solution to this problem, especially since sandboxing should reduce the kernel attack surface without introducing more complexity. > > > > > > > > > > Aside from containers using namespaces, sandbox implementations based on seccomp and landlock would also run into the same problem, since landlock only provides file system isolation and seccomp cannot filter the path argument and therefore it can only be used to block new unix domain socket connections completely. > > > > > > > > > > Currently there does not seem to be any way to disable network namespaces in the kernel without also disabling unix domain sockets. > > > > > > > > > > The question is how to solve the issue of abstract socket isolation in a clean and efficient way, possibly even without namespaces. > > > > > What would be the ideal way to implement a mechanism to disable abstract sockets either globally or even better, in the context of a process. > > > > > And would such a patch have a realistic chance to make it into the kernel? > > > > > > > > Disabling them altogether would break lots of things depending on them, > > > > like X :) (@/tmp/.X11-unix/X0). The other path is to reconsider network > > > > namespaces. There are several directions this could lead. For one, as > > > > Dinesh Subhraveti often points out, the current "network" namespace is > > > > really a network device namespace. If we instead namespace at the > > > > bind/connect/etc calls, we end up with much different abilities. > > > > > > The LSM layer supports access controls on abstract sockets, with at > > > least two (AppArmor, SELinux) providing abstract socket access > > > controls, other LSMs may provide controls as well. > > > > Good point. And for Stefan that may suffice, so thanks for mentioning > > that. But The LSM layer is mandatory access control for use by the > > admins. That doesn't help an unprivileged user. > > Individual LSMs may implement mandatory access control models, but > that is not an inherent requirement imposed by the LSM layer. While > the Landlock LSM does not (yet?) support access controls for abstract > sockets, it is a discretionary access control mechanism. In 2005, before namespaces were upstreamed, I posted the 'bsdjail' LSM, which briefly made it into the -mm kernel, but was eventually rejected as being an abuse of the LSM interface for OS level virtualization :) It's not 100% clear to me whether Stefan only wants isolation, or wants something closer to virtualization. Stefan, would an LSM allowing you to isolate certain processes from some abstract unix socket paths (or by label, whatever0 suffice for you? > I'm not currently aware of a discretionary access control LSM that > supports abstract socket access control, but such a LSM should be > possible if someone wanted to implement one. > > -- > paul-moore.com
[-- Attachment #1: Type: text/plain, Size: 5148 bytes --] Good point: from the "resources granted to a user" perspective, that does help bound their consumption. The nomenclature distinction seems like a good one to have, but if "network namespaces" *change the meaning of the term *and the original definition becomes "network device namespaces," then there would be a period where older and newer kernels have very different functions mapped to the same conceptual name. Might this make a bit more sense as "network namespaces" meaning what they do now - "network device namespaces," effectively; while the new concept would be "socket namespaces" to account for the various socket style interfaces provided? Thanks -Boris On Tue, Oct 24, 2023 at 10:15 AM Serge E. Hallyn <serge@hallyn.com> wrote: > Thanks for the reply. Do you have any papers which came out of this r&d > phase? Sounds very interesting. > > > Multiple NS' sharing an IP stack would exhaust ephemeral ranges faster > > Yes, but that could be a feature. I think of it as: I'm unprivileged > user serge, and I want to fire off firefox in a whatzit-namespace so > that I can redirect or forbid some connections. In this case, the > admins have not agreed to let me double my resource usage, so the fact > that the new namespace is sharing mine is a feature. And this lets > me use network-namespace-like features completely unprivileged, without > having to use a setuid-root helper to hook up a bridge. > > But, I didn't send this reply to advocate this approach. My main point > was to mention that "network namespaces are network device namespaces" > and hope that others would bring other suggestions for alternatives. > > -serge > > On Tue, Oct 24, 2023 at 10:05:29AM -0400, Boris Lukashev wrote: > > Namespacing at OSI4 seems a bit fraught as the underlying route, mac, > and physdev fall outside the callers control. Multiple NS' sharing an IP > stack would exhaust ephemeral ranges faster (likely asymmetrically too) and > have bound socket collisions opaque to each other requiring handling > outside the NS/containers purview. We looked at this sort of thing during > the r&d phase of our assured comms work (namespaces were young) and found a > bunch of overhead and collision concerns. Not saying it can't be done, but > getting consumers to play nice enough with such an approach may be a heavy > lift. > > > > Thanks, > > -Boris > > > > > > On October 24, 2023 9:46:08 AM EDT, "Serge E. Hallyn" <serge@hallyn.com> > wrote: > > >On Sun, Dec 18, 2022 at 08:29:10PM +0100, Stefan Bavendiek wrote: > > >> When building userspace application sandboxes, one issue that does > not seem trivial to solve is the isolation of abstract sockets. > > > > > >Veeery late reply. Have you had any productive discussions about this > in > > >other threads or venues? > > > > > >> While most IPC mechanism can be isolated by mechanisms like mount > namespaces, abstract sockets are part of the network namespace. > > >> It is possible to isolate abstract sockets by using a new network > namespace, however, unprivileged processes can only create a new empty > network namespace, which removes network access as well and makes this > useless for network clients. > > >> > > >> Same linux sandbox projects try to solve this by bridging the > existing network interfaces into the new namespace or use something like > slirp4netns to archive this, but this does not look like an ideal solution > to this problem, especially since sandboxing should reduce the kernel > attack surface without introducing more complexity. > > >> > > >> Aside from containers using namespaces, sandbox implementations based > on seccomp and landlock would also run into the same problem, since > landlock only provides file system isolation and seccomp cannot filter the > path argument and therefore it can only be used to block new unix domain > socket connections completely. > > >> > > >> Currently there does not seem to be any way to disable network > namespaces in the kernel without also disabling unix domain sockets. > > >> > > >> The question is how to solve the issue of abstract socket isolation > in a clean and efficient way, possibly even without namespaces. > > >> What would be the ideal way to implement a mechanism to disable > abstract sockets either globally or even better, in the context of a > process. > > >> And would such a patch have a realistic chance to make it into the > kernel? > > > > > >Disabling them altogether would break lots of things depending on them, > > >like X :) (@/tmp/.X11-unix/X0). The other path is to reconsider > network > > >namespaces. There are several directions this could lead. For one, as > > >Dinesh Subhraveti often points out, the current "network" namespace is > > >really a network device namespace. If we instead namespace at the > > >bind/connect/etc calls, we end up with much different abilities. You > > >can implement something like this today using seccomp-filter. > > > > > >-serge > -- Boris Lukashev Systems Architect Semper Victus <https://www.sempervictus.com> [-- Attachment #2: Type: text/html, Size: 6144 bytes --]
On Tue, Oct 24, 2023 at 10:18 AM Serge E. Hallyn <serge@hallyn.com> wrote:
> On Tue, Oct 24, 2023 at 10:14:29AM -0400, Paul Moore wrote:
> > On Tue, Oct 24, 2023 at 9:46 AM Serge E. Hallyn <serge@hallyn.com> wrote:
> > > On Sun, Dec 18, 2022 at 08:29:10PM +0100, Stefan Bavendiek wrote:
> > > > When building userspace application sandboxes, one issue that does not seem trivial to solve is the isolation of abstract sockets.
> > >
> > > Veeery late reply. Have you had any productive discussions about this in
> > > other threads or venues?
> > >
> > > > While most IPC mechanism can be isolated by mechanisms like mount namespaces, abstract sockets are part of the network namespace.
> > > > It is possible to isolate abstract sockets by using a new network namespace, however, unprivileged processes can only create a new empty network namespace, which removes network access as well and makes this useless for network clients.
> > > >
> > > > Same linux sandbox projects try to solve this by bridging the existing network interfaces into the new namespace or use something like slirp4netns to archive this, but this does not look like an ideal solution to this problem, especially since sandboxing should reduce the kernel attack surface without introducing more complexity.
> > > >
> > > > Aside from containers using namespaces, sandbox implementations based on seccomp and landlock would also run into the same problem, since landlock only provides file system isolation and seccomp cannot filter the path argument and therefore it can only be used to block new unix domain socket connections completely.
> > > >
> > > > Currently there does not seem to be any way to disable network namespaces in the kernel without also disabling unix domain sockets.
> > > >
> > > > The question is how to solve the issue of abstract socket isolation in a clean and efficient way, possibly even without namespaces.
> > > > What would be the ideal way to implement a mechanism to disable abstract sockets either globally or even better, in the context of a process.
> > > > And would such a patch have a realistic chance to make it into the kernel?
> > >
> > > Disabling them altogether would break lots of things depending on them,
> > > like X :) (@/tmp/.X11-unix/X0). The other path is to reconsider network
> > > namespaces. There are several directions this could lead. For one, as
> > > Dinesh Subhraveti often points out, the current "network" namespace is
> > > really a network device namespace. If we instead namespace at the
> > > bind/connect/etc calls, we end up with much different abilities.
> >
> > The LSM layer supports access controls on abstract sockets, with at
> > least two (AppArmor, SELinux) providing abstract socket access
> > controls, other LSMs may provide controls as well.
>
> Good point. And for Stefan that may suffice, so thanks for mentioning
> that. But The LSM layer is mandatory access control for use by the
> admins. That doesn't help an unprivileged user.
Individual LSMs may implement mandatory access control models, but
that is not an inherent requirement imposed by the LSM layer. While
the Landlock LSM does not (yet?) support access controls for abstract
sockets, it is a discretionary access control mechanism.
I'm not currently aware of a discretionary access control LSM that
supports abstract socket access control, but such a LSM should be
possible if someone wanted to implement one.
--
paul-moore.com
On Tue, Oct 24, 2023 at 10:14:29AM -0400, Paul Moore wrote:
> On Tue, Oct 24, 2023 at 9:46 AM Serge E. Hallyn <serge@hallyn.com> wrote:
> > On Sun, Dec 18, 2022 at 08:29:10PM +0100, Stefan Bavendiek wrote:
> > > When building userspace application sandboxes, one issue that does not seem trivial to solve is the isolation of abstract sockets.
> >
> > Veeery late reply. Have you had any productive discussions about this in
> > other threads or venues?
> >
> > > While most IPC mechanism can be isolated by mechanisms like mount namespaces, abstract sockets are part of the network namespace.
> > > It is possible to isolate abstract sockets by using a new network namespace, however, unprivileged processes can only create a new empty network namespace, which removes network access as well and makes this useless for network clients.
> > >
> > > Same linux sandbox projects try to solve this by bridging the existing network interfaces into the new namespace or use something like slirp4netns to archive this, but this does not look like an ideal solution to this problem, especially since sandboxing should reduce the kernel attack surface without introducing more complexity.
> > >
> > > Aside from containers using namespaces, sandbox implementations based on seccomp and landlock would also run into the same problem, since landlock only provides file system isolation and seccomp cannot filter the path argument and therefore it can only be used to block new unix domain socket connections completely.
> > >
> > > Currently there does not seem to be any way to disable network namespaces in the kernel without also disabling unix domain sockets.
> > >
> > > The question is how to solve the issue of abstract socket isolation in a clean and efficient way, possibly even without namespaces.
> > > What would be the ideal way to implement a mechanism to disable abstract sockets either globally or even better, in the context of a process.
> > > And would such a patch have a realistic chance to make it into the kernel?
> >
> > Disabling them altogether would break lots of things depending on them,
> > like X :) (@/tmp/.X11-unix/X0). The other path is to reconsider network
> > namespaces. There are several directions this could lead. For one, as
> > Dinesh Subhraveti often points out, the current "network" namespace is
> > really a network device namespace. If we instead namespace at the
> > bind/connect/etc calls, we end up with much different abilities.
>
> The LSM layer supports access controls on abstract sockets, with at
> least two (AppArmor, SELinux) providing abstract socket access
> controls, other LSMs may provide controls as well.
Good point. And for Stefan that may suffice, so thanks for mentioning
that. But The LSM layer is mandatory access control for use by the
admins. That doesn't help an unprivileged user.
Thanks for the reply. Do you have any papers which came out of this r&d phase? Sounds very interesting. > Multiple NS' sharing an IP stack would exhaust ephemeral ranges faster Yes, but that could be a feature. I think of it as: I'm unprivileged user serge, and I want to fire off firefox in a whatzit-namespace so that I can redirect or forbid some connections. In this case, the admins have not agreed to let me double my resource usage, so the fact that the new namespace is sharing mine is a feature. And this lets me use network-namespace-like features completely unprivileged, without having to use a setuid-root helper to hook up a bridge. But, I didn't send this reply to advocate this approach. My main point was to mention that "network namespaces are network device namespaces" and hope that others would bring other suggestions for alternatives. -serge On Tue, Oct 24, 2023 at 10:05:29AM -0400, Boris Lukashev wrote: > Namespacing at OSI4 seems a bit fraught as the underlying route, mac, and physdev fall outside the callers control. Multiple NS' sharing an IP stack would exhaust ephemeral ranges faster (likely asymmetrically too) and have bound socket collisions opaque to each other requiring handling outside the NS/containers purview. We looked at this sort of thing during the r&d phase of our assured comms work (namespaces were young) and found a bunch of overhead and collision concerns. Not saying it can't be done, but getting consumers to play nice enough with such an approach may be a heavy lift. > > Thanks, > -Boris > > > On October 24, 2023 9:46:08 AM EDT, "Serge E. Hallyn" <serge@hallyn.com> wrote: > >On Sun, Dec 18, 2022 at 08:29:10PM +0100, Stefan Bavendiek wrote: > >> When building userspace application sandboxes, one issue that does not seem trivial to solve is the isolation of abstract sockets. > > > >Veeery late reply. Have you had any productive discussions about this in > >other threads or venues? > > > >> While most IPC mechanism can be isolated by mechanisms like mount namespaces, abstract sockets are part of the network namespace. > >> It is possible to isolate abstract sockets by using a new network namespace, however, unprivileged processes can only create a new empty network namespace, which removes network access as well and makes this useless for network clients. > >> > >> Same linux sandbox projects try to solve this by bridging the existing network interfaces into the new namespace or use something like slirp4netns to archive this, but this does not look like an ideal solution to this problem, especially since sandboxing should reduce the kernel attack surface without introducing more complexity. > >> > >> Aside from containers using namespaces, sandbox implementations based on seccomp and landlock would also run into the same problem, since landlock only provides file system isolation and seccomp cannot filter the path argument and therefore it can only be used to block new unix domain socket connections completely. > >> > >> Currently there does not seem to be any way to disable network namespaces in the kernel without also disabling unix domain sockets. > >> > >> The question is how to solve the issue of abstract socket isolation in a clean and efficient way, possibly even without namespaces. > >> What would be the ideal way to implement a mechanism to disable abstract sockets either globally or even better, in the context of a process. > >> And would such a patch have a realistic chance to make it into the kernel? > > > >Disabling them altogether would break lots of things depending on them, > >like X :) (@/tmp/.X11-unix/X0). The other path is to reconsider network > >namespaces. There are several directions this could lead. For one, as > >Dinesh Subhraveti often points out, the current "network" namespace is > >really a network device namespace. If we instead namespace at the > >bind/connect/etc calls, we end up with much different abilities. You > >can implement something like this today using seccomp-filter. > > > >-serge
On Tue, Oct 24, 2023 at 9:46 AM Serge E. Hallyn <serge@hallyn.com> wrote:
> On Sun, Dec 18, 2022 at 08:29:10PM +0100, Stefan Bavendiek wrote:
> > When building userspace application sandboxes, one issue that does not seem trivial to solve is the isolation of abstract sockets.
>
> Veeery late reply. Have you had any productive discussions about this in
> other threads or venues?
>
> > While most IPC mechanism can be isolated by mechanisms like mount namespaces, abstract sockets are part of the network namespace.
> > It is possible to isolate abstract sockets by using a new network namespace, however, unprivileged processes can only create a new empty network namespace, which removes network access as well and makes this useless for network clients.
> >
> > Same linux sandbox projects try to solve this by bridging the existing network interfaces into the new namespace or use something like slirp4netns to archive this, but this does not look like an ideal solution to this problem, especially since sandboxing should reduce the kernel attack surface without introducing more complexity.
> >
> > Aside from containers using namespaces, sandbox implementations based on seccomp and landlock would also run into the same problem, since landlock only provides file system isolation and seccomp cannot filter the path argument and therefore it can only be used to block new unix domain socket connections completely.
> >
> > Currently there does not seem to be any way to disable network namespaces in the kernel without also disabling unix domain sockets.
> >
> > The question is how to solve the issue of abstract socket isolation in a clean and efficient way, possibly even without namespaces.
> > What would be the ideal way to implement a mechanism to disable abstract sockets either globally or even better, in the context of a process.
> > And would such a patch have a realistic chance to make it into the kernel?
>
> Disabling them altogether would break lots of things depending on them,
> like X :) (@/tmp/.X11-unix/X0). The other path is to reconsider network
> namespaces. There are several directions this could lead. For one, as
> Dinesh Subhraveti often points out, the current "network" namespace is
> really a network device namespace. If we instead namespace at the
> bind/connect/etc calls, we end up with much different abilities.
The LSM layer supports access controls on abstract sockets, with at
least two (AppArmor, SELinux) providing abstract socket access
controls, other LSMs may provide controls as well.
--
paul-moore.com
[-- Attachment #1: Type: text/plain, Size: 3077 bytes --] Namespacing at OSI4 seems a bit fraught as the underlying route, mac, and physdev fall outside the callers control. Multiple NS' sharing an IP stack would exhaust ephemeral ranges faster (likely asymmetrically too) and have bound socket collisions opaque to each other requiring handling outside the NS/containers purview. We looked at this sort of thing during the r&d phase of our assured comms work (namespaces were young) and found a bunch of overhead and collision concerns. Not saying it can't be done, but getting consumers to play nice enough with such an approach may be a heavy lift. Thanks, -Boris On October 24, 2023 9:46:08 AM EDT, "Serge E. Hallyn" <serge@hallyn.com> wrote: >On Sun, Dec 18, 2022 at 08:29:10PM +0100, Stefan Bavendiek wrote: >> When building userspace application sandboxes, one issue that does not seem trivial to solve is the isolation of abstract sockets. > >Veeery late reply. Have you had any productive discussions about this in >other threads or venues? > >> While most IPC mechanism can be isolated by mechanisms like mount namespaces, abstract sockets are part of the network namespace. >> It is possible to isolate abstract sockets by using a new network namespace, however, unprivileged processes can only create a new empty network namespace, which removes network access as well and makes this useless for network clients. >> >> Same linux sandbox projects try to solve this by bridging the existing network interfaces into the new namespace or use something like slirp4netns to archive this, but this does not look like an ideal solution to this problem, especially since sandboxing should reduce the kernel attack surface without introducing more complexity. >> >> Aside from containers using namespaces, sandbox implementations based on seccomp and landlock would also run into the same problem, since landlock only provides file system isolation and seccomp cannot filter the path argument and therefore it can only be used to block new unix domain socket connections completely. >> >> Currently there does not seem to be any way to disable network namespaces in the kernel without also disabling unix domain sockets. >> >> The question is how to solve the issue of abstract socket isolation in a clean and efficient way, possibly even without namespaces. >> What would be the ideal way to implement a mechanism to disable abstract sockets either globally or even better, in the context of a process. >> And would such a patch have a realistic chance to make it into the kernel? > >Disabling them altogether would break lots of things depending on them, >like X :) (@/tmp/.X11-unix/X0). The other path is to reconsider network >namespaces. There are several directions this could lead. For one, as >Dinesh Subhraveti often points out, the current "network" namespace is >really a network device namespace. If we instead namespace at the >bind/connect/etc calls, we end up with much different abilities. You >can implement something like this today using seccomp-filter. > >-serge [-- Attachment #2: Type: text/html, Size: 3755 bytes --]
On Sun, Dec 18, 2022 at 08:29:10PM +0100, Stefan Bavendiek wrote: > When building userspace application sandboxes, one issue that does not seem trivial to solve is the isolation of abstract sockets. Veeery late reply. Have you had any productive discussions about this in other threads or venues? > While most IPC mechanism can be isolated by mechanisms like mount namespaces, abstract sockets are part of the network namespace. > It is possible to isolate abstract sockets by using a new network namespace, however, unprivileged processes can only create a new empty network namespace, which removes network access as well and makes this useless for network clients. > > Same linux sandbox projects try to solve this by bridging the existing network interfaces into the new namespace or use something like slirp4netns to archive this, but this does not look like an ideal solution to this problem, especially since sandboxing should reduce the kernel attack surface without introducing more complexity. > > Aside from containers using namespaces, sandbox implementations based on seccomp and landlock would also run into the same problem, since landlock only provides file system isolation and seccomp cannot filter the path argument and therefore it can only be used to block new unix domain socket connections completely. > > Currently there does not seem to be any way to disable network namespaces in the kernel without also disabling unix domain sockets. > > The question is how to solve the issue of abstract socket isolation in a clean and efficient way, possibly even without namespaces. > What would be the ideal way to implement a mechanism to disable abstract sockets either globally or even better, in the context of a process. > And would such a patch have a realistic chance to make it into the kernel? Disabling them altogether would break lots of things depending on them, like X :) (@/tmp/.X11-unix/X0). The other path is to reconsider network namespaces. There are several directions this could lead. For one, as Dinesh Subhraveti often points out, the current "network" namespace is really a network device namespace. If we instead namespace at the bind/connect/etc calls, we end up with much different abilities. You can implement something like this today using seccomp-filter. -serge
On Wed, Oct 11, 2023 at 08:22:49AM +0200, Greg KH wrote:
> What b4 option does a "I applied this patch" response? The
> --cc-trailers option to 'shazam'? Or something else?
If you're using "b4 shazam", then it'll keep a record of it already and
you can use "b4 ty" to send the "thank you" email to the thread.
--
Kees Cook