* [BUG] Race between policy reload sidtab conversion and live conversion @ 2021-02-23 21:43 Tyler Hicks 2021-02-23 21:50 ` Tyler Hicks 0 siblings, 1 reply; 16+ messages in thread From: Tyler Hicks @ 2021-02-23 21:43 UTC (permalink / raw) To: Paul Moore, Stephen Smalley, Ondrej Mosnacek; +Cc: selinux, linux-kernel I'm seeing a race during policy load while the "regular" sidtab conversion is happening and a live conversion starts to take place in sidtab_context_to_sid(). We have an initial policy that's loaded by systemd ~0.6s into boot and then another policy gets loaded ~2-3s into boot. That second policy load is what hits the race condition situation because the sidtab is only partially populated and there's a decent amount of filesystem operations happening, at the same time, which are triggering live conversions. [ 3.091910] Unable to handle kernel paging request at virtual address 001303e1aa140408 [ 3.100083] Mem abort info: [ 3.102963] ESR = 0x96000004 [ 3.102965] EC = 0x25: DABT (current EL), IL = 32 bits [ 3.102967] SET = 0, FnV = 0 [ 3.102968] EA = 0, S1PTW = 0 [ 3.102969] Data abort info: [ 3.102970] ISV = 0, ISS = 0x00000004 [ 3.102971] CM = 0, WnR = 0 [ 3.102973] [001303e1aa140408] address between user and kernel address ranges [ 3.102977] Internal error: Oops: 96000004 [#1] SMP [ 3.102981] Modules linked in: [ 3.111250] bnxt_en pcie_iproc_platform pcie_iproc diagbe(O) [ 3.111259] CPU: 0 PID: 529 Comm: (tservice) Tainted: G O 5.10.17.1 #1 [ 3.119881] Hardware name: redacted (DT) [ 3.119884] pstate: 20400085 (nzCv daIf +PAN -UAO -TCO BTYPE=--) [ 3.119898] pc : sidtab_do_lookup (/usr/src/kernel/security/selinux/ss/sidtab.c:202) [ 3.119902] lr : sidtab_context_to_sid (/usr/src/kernel/security/selinux/ss/sidtab.c:312) [ 3.126105] sp : ffff800011ceb810 [ 3.126106] x29: ffff800011ceb810 x28: 0000000000000000 [ 3.126108] x27: 0000000000000005 x26: ffffda109f3f2000 [ 3.126110] x25: 00000000ffffffff x24: 0000000000000000 [ 3.126113] x23: 0000000000000001 [ 3.133124] x22: 0000000000000005 [ 3.133125] x21: aa1303e1aa140408 x20: 0000000000000001 [ 3.133127] x19: 00000000000000cc x18: 0000000000000003 [ 3.133128] x17: 000000000000003e x16: 000000000000003f [ 3.145519] x15: 0000000000000039 x14: 000000000000002e [ 3.145521] x13: 0000000058294db1 x12: 00000000158294db [ 3.145523] x11: 000000007f0b3af2 x10: 0000000000004e00 [ 3.145525] x9 : 00000000000000cd x8 : 0000000000000005 [ 3.281289] x7 : feff735e62647764 x6 : 00000000000080cc [ 3.286769] x5 : 0000000000000005 x4 : ffff3f47c5b20000 [ 3.292249] x3 : ffff800011ceb900 x2 : 0000000000000001 [ 3.297729] x1 : 00000000000000cc x0 : aa1303e1aa1403e0 [ 3.303210] Call trace: [ 3.305733] sidtab_do_lookup (/usr/src/kernel/security/selinux/ss/sidtab.c:202) [ 3.309867] sidtab_context_to_sid (/usr/src/kernel/security/selinux/ss/sidtab.c:312) [ 3.314451] security_context_to_sid_core (/usr/src/kernel/security/selinux/ss/services.c:1557) [ 3.319661] security_context_to_sid_default (/usr/src/kernel/security/selinux/ss/services.c:1616) [ 3.324961] inode_doinit_use_xattr (/usr/src/kernel/security/selinux/hooks.c:1366) [ 3.329634] inode_doinit_with_dentry (/usr/src/kernel/security/selinux/hooks.c:1457) [ 3.334486] selinux_d_instantiate (/usr/src/kernel/security/selinux/hooks.c:6278) [ 3.338889] security_d_instantiate (/usr/src/kernel/security/security.c:2004) [ 3.343385] d_splice_alias (/usr/src/kernel/fs/dcache.c:3030) [ 3.347251] squashfs_lookup (/usr/src/kernel/fs/squashfs/namei.c:220) [ 3.385561] el0_sync_handler (/usr/src/kernel/arch/arm64/kernel/entry-common.c:428) [ 3.389517] el0_sync (/usr/src/kernel/arch/arm64/kernel/entry.S:671) [ 3.392939] Code: 51002718 340001d7 1ad82768 8b284c15 (f94002a0) All code ======== 0: 18 27 sbb %ah,(%rdi) 2: 00 51 d7 add %dl,-0x29(%rcx) 5: 01 00 add %eax,(%rax) 7: 34 68 xor $0x68,%al 9: 27 (bad) a: d8 1a fcomps (%rdx) c:* 15 4c 28 8b a0 adc $0xa08b284c,%eax <-- trapping instruction 11: 02 40 f9 add -0x7(%rax),%al Code starting with the faulting instruction =========================================== 0: a0 .byte 0xa0 1: 02 40 f9 add -0x7(%rax),%al [ 3.399230] ---[ end trace cc1840b3ff2c7506 ]--- The corresponding source from sidtab.c: 179 static struct sidtab_entry *sidtab_do_lookup(struct sidtab *s, u32 index, 180 int alloc) 181 { ... 193 /* lookup inside the subtree */ 194 entry = &s->roots[level]; 195 while (level != 0) { 196 capacity_shift -= SIDTAB_INNER_SHIFT; 197 --level; 198 199 entry = &entry->ptr_inner->entries[leaf_index >> capacity_shift]; 200 leaf_index &= ((u32)1 << capacity_shift) - 1; 201 202 if (!entry->ptr_inner) { 203 if (alloc) 204 entry->ptr_inner = kzalloc(SIDTAB_NODE_ALLOC_SIZE, 205 GFP_ATOMIC); 206 if (!entry->ptr_inner) 207 return NULL; 208 } 209 } 210 if (!entry->ptr_leaf) { 211 if (alloc) 212 entry->ptr_leaf = kzalloc(SIDTAB_NODE_ALLOC_SIZE, 213 GFP_ATOMIC); 214 if (!entry->ptr_leaf) 215 return NULL; 216 } 217 return &entry->ptr_leaf->entries[index % SIDTAB_LEAF_ENTRIES]; 218 } ... 263 int sidtab_context_to_sid(struct sidtab *s, struct context *context, 264 u32 *sid) 265 { ... 305 /* 306 * if we are building a new sidtab, we need to convert the context 307 * and insert it there as well 308 */ 309 if (convert) { 310 rc = -ENOMEM; 311 dst_convert = sidtab_do_lookup(convert->target, count, 1); 312 if (!dst_convert) { 313 context_destroy(&dst->context); 314 goto out_unlock; 315 } ... What I'm having trouble understanding is how the above call to sidtab_do_lookup(), on the target sidtab that's undergoing a conversion in sidtab_convert(), can be expected to work. sidtab_convert_tree() is allocating and initializing ptr_inner sidtab nodes at the same time sidtab_do_lookup() is trying to use them with no locking being performed on the target sidtab. Ondrej specifically mentions, in commit ee1a84fdfeed ("selinux: overhaul sidtab to fix bug and improve performance"), that there's no need to freeze the sidtab during policy reloads so I know that there's thought given to these code paths running in parallel. Can someone more knowledgeable on how the sidtab locking is expected to work suggest a fix for this crash? Tyler ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] Race between policy reload sidtab conversion and live conversion 2021-02-23 21:43 [BUG] Race between policy reload sidtab conversion and live conversion Tyler Hicks @ 2021-02-23 21:50 ` Tyler Hicks 2021-02-23 22:36 ` Tyler Hicks 0 siblings, 1 reply; 16+ messages in thread From: Tyler Hicks @ 2021-02-23 21:50 UTC (permalink / raw) To: Paul Moore, Stephen Smalley, Ondrej Mosnacek; +Cc: selinux, linux-kernel On 2021-02-23 15:43:48, Tyler Hicks wrote: > I'm seeing a race during policy load while the "regular" sidtab > conversion is happening and a live conversion starts to take place in > sidtab_context_to_sid(). > > We have an initial policy that's loaded by systemd ~0.6s into boot and > then another policy gets loaded ~2-3s into boot. That second policy load > is what hits the race condition situation because the sidtab is only > partially populated and there's a decent amount of filesystem operations > happening, at the same time, which are triggering live conversions. Here are the relevant messages written to the ring buffer showing the initial policy load, the second policy load, the sidtab conversion starting, and then the stack trace: [ 0.112159] SELinux: Initializing. ... [ 0.624107] audit: type=1404 audit(0.492:2): enforcing=1 old_enforcing=0 auid=4294967295 ses=4294967295 enabled=1 old-enabled=1 lsm=selinux res=1 [ 0.664063] SELinux: Permission perfmon in class capability2 not defined in policy. [ 0.664068] SELinux: Permission bpf in class capability2 not defined in policy. [ 0.664070] SELinux: Permission checkpoint_restore in class capability2 not defined in policy. [ 0.664077] SELinux: Permission perfmon in class cap2_userns not defined in policy. [ 0.664079] SELinux: Permission bpf in class cap2_userns not defined in policy. [ 0.664080] SELinux: Permission checkpoint_restore in class cap2_userns not defined in policy. [ 0.664114] SELinux: Class perf_event not defined in policy. [ 0.664115] SELinux: Class lockdown not defined in policy. [ 0.664117] SELinux: the above unknown classes and permissions will be allowed [ 0.667863] SELinux: policy capability network_peer_controls=1 [ 0.667866] SELinux: policy capability open_perms=1 [ 0.667867] SELinux: policy capability extended_socket_class=1 [ 0.667868] SELinux: policy capability always_check_network=0 [ 0.667870] SELinux: policy capability cgroup_seclabel=1 [ 0.667871] SELinux: policy capability nnp_nosuid_transition=1 [ 0.667873] SELinux: policy capability genfs_seclabel_symlinks=0 [ 0.712152] audit: type=1403 audit(0.584:3): auid=4294967295 ses=4294967295 lsm=selinux res=1 [ 0.714430] systemd[1]: Successfully loaded SELinux policy in 91.004ms. ... [ 3.042735] SELinux: Permission perfmon in class capability2 not defined in policy. [ 3.042741] SELinux: Permission bpf in class capability2 not defined in policy. [ 3.042743] SELinux: Permission checkpoint_restore in class capability2 not defined in policy. [ 3.042750] SELinux: Permission perfmon in class cap2_userns not defined in policy. [ 3.042751] SELinux: Permission bpf in class cap2_userns not defined in policy. [ 3.042753] SELinux: Permission checkpoint_restore in class cap2_userns not defined in policy. [ 3.042787] SELinux: Class perf_event not defined in policy. [ 3.042789] SELinux: Class lockdown not defined in policy. [ 3.042791] SELinux: the above unknown classes and permissions will be allowed [ 3.042801] SELinux: Converting 208 SID table entries... [ 3.044951] Unable to handle kernel access to user memory outside uaccess routines at virtual address 0000000040000028 [ 3.050919] SELinux: policy capability network_peer_controls=1 [ 3.055990] Mem abort info: [ 3.055992] ESR = 0x96000004 [ 3.055994] EC = 0x25: DABT (current EL), IL = 32 bits [ 3.055995] SET = 0, FnV = 0 [ 3.055996] EA = 0, S1PTW = 0 [ 3.055997] Data abort info: [ 3.055998] ISV = 0, ISS = 0x00000004 [ 3.055999] CM = 0, WnR = 0 [ 3.056002] user pgtable: 4k pages, 48-bit VAs, pgdp=00000000940ea000 [ 3.056003] [0000000040000028] pgd=0000000000000000, p4d=0000000000000000 [ 3.058894] SELinux: policy capability open_perms=1 [ 3.062037] [ 3.062041] Internal error: Oops: 96000004 [#1] SMP [ 3.062044] Modules linked in: bnxt_en pcie_iproc_platform [ 3.067530] SELinux: policy capability extended_socket_class=1 [ 3.070674] pcie_iproc diagbe(O) [ 3.070680] CPU: 1 PID: 512 Comm: (tservice) Tainted: G O 5.10.17.1 #1 [ 3.070683] Hardware name: redacted (DT) [ 3.073924] SELinux: policy capability always_check_network=0 [ 3.076890] pstate: 20400085 (nzCv daIf +PAN -UAO -TCO BTYPE=--) [ 3.076904] pc : sidtab_do_lookup+0x128/0x1b4 [ 3.076908] lr : sidtab_context_to_sid+0x210/0x378 [ 3.080864] SELinux: policy capability cgroup_seclabel=1 [ 3.083920] sp : ffff800011bfbaf0 [ 3.083922] x29: ffff800011bfbaf0 [ 3.090570] SELinux: policy capability nnp_nosuid_transition=1 [ 3.097564] x28: 0000000000000000 [ 3.097566] x27: 0000000000000005 x26: ffffd1b6451f2000 [ 3.097568] x25: 00000000ffffffff x24: 0000000000000000 [ 3.102693] SELinux: policy capability genfs_seclabel_symlinks=0 [ 3.104225] x23: 0000000000000001 x22: 0000000000000005 [ 3.104227] x21: 0000000040000028 x20: 0000000000000001 [ 3.104230] x19: 00000000000000d0 x18: 5b00000000000000 [ 3.217640] x17: 0000000077493ecb x16: 00000000c2b2ae35 [ 3.223120] x15: 00000000639285d3 x14: 0000000000000010 [ 3.228600] x13: 00000000128b8525 x12: 000000005128b852 [ 3.234079] x11: 0000000049dd48c1 x10: 0000000000004e00 [ 3.239559] x9 : 00000000000000d1 x8 : 0000000000000005 [ 3.245039] x7 : 0000000000000001 x6 : 0000000000000000 [ 3.250518] x5 : ffff800011bfbc48 x4 : 0000000000000010 [ 3.255998] x3 : 0000000000000002 x2 : 0000000000000001 [ 3.261477] x1 : 00000000000000d0 x0 : 0000000040000000 [ 3.266957] Call trace: [ 3.269480] sidtab_do_lookup+0x128/0x1b4 [ 3.273615] sidtab_context_to_sid+0x210/0x378 [ 3.278198] security_compute_sid+0x3f4/0x60c [ 3.282692] security_transition_sid+0x30/0x38 [ 3.287277] selinux_bprm_creds_for_exec+0x11c/0x2e8 [ 3.292398] security_bprm_creds_for_exec+0x30/0x4c [ 3.297432] bprm_execve+0x100/0x1d4 [ 3.301118] do_execveat_common+0x1e8/0x228 [ 3.305432] __arm64_sys_execve+0x3c/0x4c [ 3.309569] do_el0_svc+0xec/0x154 [ 3.313079] el0_svc+0xc/0x14 [ 3.316139] el0_sync_handler+0x7c/0xd8 [ 3.320095] el0_sync+0x144/0x180 [ 3.323516] Code: 51002718 340001d7 1ad82768 8b284c15 (f94002a0) [ 3.329808] ---[ end trace 69ebb6381d57e49b ]--- Tyler ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] Race between policy reload sidtab conversion and live conversion 2021-02-23 21:50 ` Tyler Hicks @ 2021-02-23 22:36 ` Tyler Hicks 2021-02-24 0:02 ` Paul Moore 2021-02-24 9:33 ` Ondrej Mosnacek 0 siblings, 2 replies; 16+ messages in thread From: Tyler Hicks @ 2021-02-23 22:36 UTC (permalink / raw) To: Paul Moore, Stephen Smalley, Ondrej Mosnacek; +Cc: selinux, linux-kernel On 2021-02-23 15:50:56, Tyler Hicks wrote: > On 2021-02-23 15:43:48, Tyler Hicks wrote: > > I'm seeing a race during policy load while the "regular" sidtab > > conversion is happening and a live conversion starts to take place in > > sidtab_context_to_sid(). > > > > We have an initial policy that's loaded by systemd ~0.6s into boot and > > then another policy gets loaded ~2-3s into boot. That second policy load > > is what hits the race condition situation because the sidtab is only > > partially populated and there's a decent amount of filesystem operations > > happening, at the same time, which are triggering live conversions. Hmm, perhaps this is the same problem that's fixed by Ondrej's proposed change here: https://lore.kernel.org/selinux/20210212185930.130477-3-omosnace@redhat.com/ I'll put these changes through a validation run (the only place that I can seem to reproduce this crash) and see how it looks. Tyler ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] Race between policy reload sidtab conversion and live conversion 2021-02-23 22:36 ` Tyler Hicks @ 2021-02-24 0:02 ` Paul Moore 2021-02-24 9:33 ` Ondrej Mosnacek 1 sibling, 0 replies; 16+ messages in thread From: Paul Moore @ 2021-02-24 0:02 UTC (permalink / raw) To: Tyler Hicks; +Cc: Stephen Smalley, Ondrej Mosnacek, selinux, linux-kernel On Tue, Feb 23, 2021 at 5:36 PM Tyler Hicks <tyhicks@linux.microsoft.com> wrote: > On 2021-02-23 15:50:56, Tyler Hicks wrote: > > On 2021-02-23 15:43:48, Tyler Hicks wrote: > > > I'm seeing a race during policy load while the "regular" sidtab > > > conversion is happening and a live conversion starts to take place in > > > sidtab_context_to_sid(). > > > > > > We have an initial policy that's loaded by systemd ~0.6s into boot and > > > then another policy gets loaded ~2-3s into boot. That second policy load > > > is what hits the race condition situation because the sidtab is only > > > partially populated and there's a decent amount of filesystem operations > > > happening, at the same time, which are triggering live conversions. > > Hmm, perhaps this is the same problem that's fixed by Ondrej's proposed > change here: > > https://lore.kernel.org/selinux/20210212185930.130477-3-omosnace@redhat.com/ > > I'll put these changes through a validation run (the only place that I > can seem to reproduce this crash) and see how it looks. Thanks, please let us know what you find out. -- paul moore www.paul-moore.com ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] Race between policy reload sidtab conversion and live conversion 2021-02-23 22:36 ` Tyler Hicks 2021-02-24 0:02 ` Paul Moore @ 2021-02-24 9:33 ` Ondrej Mosnacek 2021-02-24 14:36 ` Tyler Hicks 2021-02-26 1:06 ` Paul Moore 1 sibling, 2 replies; 16+ messages in thread From: Ondrej Mosnacek @ 2021-02-24 9:33 UTC (permalink / raw) To: Tyler Hicks Cc: Paul Moore, Stephen Smalley, SElinux list, Linux kernel mailing list On Tue, Feb 23, 2021 at 11:37 PM Tyler Hicks <tyhicks@linux.microsoft.com> wrote: > On 2021-02-23 15:50:56, Tyler Hicks wrote: > > On 2021-02-23 15:43:48, Tyler Hicks wrote: > > > I'm seeing a race during policy load while the "regular" sidtab > > > conversion is happening and a live conversion starts to take place in > > > sidtab_context_to_sid(). > > > > > > We have an initial policy that's loaded by systemd ~0.6s into boot and > > > then another policy gets loaded ~2-3s into boot. That second policy load > > > is what hits the race condition situation because the sidtab is only > > > partially populated and there's a decent amount of filesystem operations > > > happening, at the same time, which are triggering live conversions. > > Hmm, perhaps this is the same problem that's fixed by Ondrej's proposed > change here: > > https://lore.kernel.org/selinux/20210212185930.130477-3-omosnace@redhat.com/ > > I'll put these changes through a validation run (the only place that I > can seem to reproduce this crash) and see how it looks. Hm... I think there is actually another race condition introduced by the switch from rwlock to RCU [1]... Judging from the call trace you may be hitting that. Basically, before the switch the sidtab swapover worked like this: 1. Start live conversion of new entries. 2. Convert existing entries. [Still only the old sidtab is visible to readers here.] 3. Swap sidtab under write lock. 4. Now only the new sidtab is visible to readers, so the old one can be destroyed. After the switch to RCU, we now have: 1. Start live conversion of new entries. 2. Convert existing entries. 3. RCU-assign the new policy pointer to selinux_state. [!!! Now actually both old and new sidtab may be referenced by readers, since there is no synchronization barrier previously provided by the write lock.] 4. Wait for synchronize_rcu() to return. 5. Now only the new sidtab is visible to readers, so the old one can be destroyed. So the race can happen between 3. and 5., if one thread already sees the new sidtab and adds a new entry there, and a second thread still has the reference to the old sidtab and also tires to add a new entry; live-converting to the new sidtab, which it doesn't expect to change by itself. Unfortunately I failed to realize this when reviewing the patch :/ I think the only two options to fix it are A) switching back to read-write lock (the easy and safe way; undoing the performance benefits of [1]), or B) implementing a safe two-way live conversion of new sidtab entries, so that both tables are kept in sync while they are both available (more complicated and with possible tricky implications of different interpretations of contexts by the two policies). [1] 1b8b31a2e612 ("selinux: convert policy read-write lock to RCU") -- Ondrej Mosnacek Software Engineer, Linux Security - SELinux kernel Red Hat, Inc. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] Race between policy reload sidtab conversion and live conversion 2021-02-24 9:33 ` Ondrej Mosnacek @ 2021-02-24 14:36 ` Tyler Hicks 2021-02-25 16:38 ` Ondrej Mosnacek 2021-02-26 1:06 ` Paul Moore 1 sibling, 1 reply; 16+ messages in thread From: Tyler Hicks @ 2021-02-24 14:36 UTC (permalink / raw) To: Ondrej Mosnacek Cc: Paul Moore, Stephen Smalley, SElinux list, Linux kernel mailing list On 2021-02-24 10:33:46, Ondrej Mosnacek wrote: > On Tue, Feb 23, 2021 at 11:37 PM Tyler Hicks > <tyhicks@linux.microsoft.com> wrote: > > On 2021-02-23 15:50:56, Tyler Hicks wrote: > > > On 2021-02-23 15:43:48, Tyler Hicks wrote: > > > > I'm seeing a race during policy load while the "regular" sidtab > > > > conversion is happening and a live conversion starts to take place in > > > > sidtab_context_to_sid(). > > > > > > > > We have an initial policy that's loaded by systemd ~0.6s into boot and > > > > then another policy gets loaded ~2-3s into boot. That second policy load > > > > is what hits the race condition situation because the sidtab is only > > > > partially populated and there's a decent amount of filesystem operations > > > > happening, at the same time, which are triggering live conversions. > > > > Hmm, perhaps this is the same problem that's fixed by Ondrej's proposed > > change here: > > > > https://lore.kernel.org/selinux/20210212185930.130477-3-omosnace@redhat.com/ > > > > I'll put these changes through a validation run (the only place that I > > can seem to reproduce this crash) and see how it looks. > > Hm... I think there is actually another race condition introduced by > the switch from rwlock to RCU [1]... Judging from the call trace you > may be hitting that. I believe your patches above fixed the race I was seeing. I was able to make it through a full validation run without any crashes. Without those patches applied, I would see several crashes resulting from this race over the course of a validation run. I'll continue to test with your changes and let you know if I end up running into the other race you spotted. Tyler > > Basically, before the switch the sidtab swapover worked like this: > 1. Start live conversion of new entries. > 2. Convert existing entries. > [Still only the old sidtab is visible to readers here.] > 3. Swap sidtab under write lock. > 4. Now only the new sidtab is visible to readers, so the old one can > be destroyed. > > After the switch to RCU, we now have: > 1. Start live conversion of new entries. > 2. Convert existing entries. > 3. RCU-assign the new policy pointer to selinux_state. > [!!! Now actually both old and new sidtab may be referenced by > readers, since there is no synchronization barrier previously provided > by the write lock.] > 4. Wait for synchronize_rcu() to return. > 5. Now only the new sidtab is visible to readers, so the old one can > be destroyed. > > So the race can happen between 3. and 5., if one thread already sees > the new sidtab and adds a new entry there, and a second thread still > has the reference to the old sidtab and also tires to add a new entry; > live-converting to the new sidtab, which it doesn't expect to change > by itself. Unfortunately I failed to realize this when reviewing the > patch :/ > > I think the only two options to fix it are A) switching back to > read-write lock (the easy and safe way; undoing the performance > benefits of [1]), or B) implementing a safe two-way live conversion of > new sidtab entries, so that both tables are kept in sync while they > are both available (more complicated and with possible tricky > implications of different interpretations of contexts by the two > policies). > > [1] 1b8b31a2e612 ("selinux: convert policy read-write lock to RCU") > > -- > Ondrej Mosnacek > Software Engineer, Linux Security - SELinux kernel > Red Hat, Inc. > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] Race between policy reload sidtab conversion and live conversion 2021-02-24 14:36 ` Tyler Hicks @ 2021-02-25 16:38 ` Ondrej Mosnacek 2021-02-25 16:45 ` Tyler Hicks 2021-02-25 23:27 ` Paul Moore 0 siblings, 2 replies; 16+ messages in thread From: Ondrej Mosnacek @ 2021-02-25 16:38 UTC (permalink / raw) To: Tyler Hicks Cc: Paul Moore, Stephen Smalley, SElinux list, Linux kernel mailing list On Wed, Feb 24, 2021 at 3:43 PM Tyler Hicks <tyhicks@linux.microsoft.com> wrote: > On 2021-02-24 10:33:46, Ondrej Mosnacek wrote: > > On Tue, Feb 23, 2021 at 11:37 PM Tyler Hicks > > <tyhicks@linux.microsoft.com> wrote: > > > On 2021-02-23 15:50:56, Tyler Hicks wrote: > > > > On 2021-02-23 15:43:48, Tyler Hicks wrote: > > > > > I'm seeing a race during policy load while the "regular" sidtab > > > > > conversion is happening and a live conversion starts to take place in > > > > > sidtab_context_to_sid(). > > > > > > > > > > We have an initial policy that's loaded by systemd ~0.6s into boot and > > > > > then another policy gets loaded ~2-3s into boot. That second policy load > > > > > is what hits the race condition situation because the sidtab is only > > > > > partially populated and there's a decent amount of filesystem operations > > > > > happening, at the same time, which are triggering live conversions. > > > > > > Hmm, perhaps this is the same problem that's fixed by Ondrej's proposed > > > change here: > > > > > > https://lore.kernel.org/selinux/20210212185930.130477-3-omosnace@redhat.com/ > > > > > > I'll put these changes through a validation run (the only place that I > > > can seem to reproduce this crash) and see how it looks. > > > > Hm... I think there is actually another race condition introduced by > > the switch from rwlock to RCU [1]... Judging from the call trace you > > may be hitting that. > > I believe your patches above fixed the race I was seeing. I was able to > make it through a full validation run without any crashes. Without those > patches applied, I would see several crashes resulting from this race > over the course of a validation run. Hm... okay so probably you were indeed running into that bug. I tried to reproduce the other race (I added a BUG_ON to help detect it), but wasn't able to reproduce it with my (pretty aggressive) stress test. I only managed to trigger it by adding a conditional delay in the right place. So I now know the second bug is really there, though it' seems to be very unlikely to be hit in practice (might be more likely on systems with many CPU cores, though). The first bug, OTOH, is triggered almost instantly by my stress test. Unless someone objects, I'll start working on a patch to switch back to read-write lock for now. If all goes well, I'll send it sometime next week. > > I'll continue to test with your changes and let you know if I end up > running into the other race you spotted. Thanks, but given the results of my testing it's probably not worth trying :) > > Tyler > > > > > Basically, before the switch the sidtab swapover worked like this: > > 1. Start live conversion of new entries. > > 2. Convert existing entries. > > [Still only the old sidtab is visible to readers here.] > > 3. Swap sidtab under write lock. > > 4. Now only the new sidtab is visible to readers, so the old one can > > be destroyed. > > > > After the switch to RCU, we now have: > > 1. Start live conversion of new entries. > > 2. Convert existing entries. > > 3. RCU-assign the new policy pointer to selinux_state. > > [!!! Now actually both old and new sidtab may be referenced by > > readers, since there is no synchronization barrier previously provided > > by the write lock.] > > 4. Wait for synchronize_rcu() to return. > > 5. Now only the new sidtab is visible to readers, so the old one can > > be destroyed. > > > > So the race can happen between 3. and 5., if one thread already sees > > the new sidtab and adds a new entry there, and a second thread still > > has the reference to the old sidtab and also tires to add a new entry; > > live-converting to the new sidtab, which it doesn't expect to change > > by itself. Unfortunately I failed to realize this when reviewing the > > patch :/ > > > > I think the only two options to fix it are A) switching back to > > read-write lock (the easy and safe way; undoing the performance > > benefits of [1]), or B) implementing a safe two-way live conversion of > > new sidtab entries, so that both tables are kept in sync while they > > are both available (more complicated and with possible tricky > > implications of different interpretations of contexts by the two > > policies). > > > > [1] 1b8b31a2e612 ("selinux: convert policy read-write lock to RCU") > > > > -- > > Ondrej Mosnacek > > Software Engineer, Linux Security - SELinux kernel > > Red Hat, Inc. > > > -- Ondrej Mosnacek Software Engineer, Linux Security - SELinux kernel Red Hat, Inc. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] Race between policy reload sidtab conversion and live conversion 2021-02-25 16:38 ` Ondrej Mosnacek @ 2021-02-25 16:45 ` Tyler Hicks 2021-02-25 23:27 ` Paul Moore 1 sibling, 0 replies; 16+ messages in thread From: Tyler Hicks @ 2021-02-25 16:45 UTC (permalink / raw) To: Ondrej Mosnacek Cc: Paul Moore, Stephen Smalley, SElinux list, Linux kernel mailing list On 2021-02-25 17:38:25, Ondrej Mosnacek wrote: > On Wed, Feb 24, 2021 at 3:43 PM Tyler Hicks <tyhicks@linux.microsoft.com> wrote: > > On 2021-02-24 10:33:46, Ondrej Mosnacek wrote: > > > On Tue, Feb 23, 2021 at 11:37 PM Tyler Hicks > > > <tyhicks@linux.microsoft.com> wrote: > > > > On 2021-02-23 15:50:56, Tyler Hicks wrote: > > > > > On 2021-02-23 15:43:48, Tyler Hicks wrote: > > > > > > I'm seeing a race during policy load while the "regular" sidtab > > > > > > conversion is happening and a live conversion starts to take place in > > > > > > sidtab_context_to_sid(). > > > > > > > > > > > > We have an initial policy that's loaded by systemd ~0.6s into boot and > > > > > > then another policy gets loaded ~2-3s into boot. That second policy load > > > > > > is what hits the race condition situation because the sidtab is only > > > > > > partially populated and there's a decent amount of filesystem operations > > > > > > happening, at the same time, which are triggering live conversions. > > > > > > > > Hmm, perhaps this is the same problem that's fixed by Ondrej's proposed > > > > change here: > > > > > > > > https://lore.kernel.org/selinux/20210212185930.130477-3-omosnace@redhat.com/ > > > > > > > > I'll put these changes through a validation run (the only place that I > > > > can seem to reproduce this crash) and see how it looks. > > > > > > Hm... I think there is actually another race condition introduced by > > > the switch from rwlock to RCU [1]... Judging from the call trace you > > > may be hitting that. > > > > I believe your patches above fixed the race I was seeing. I was able to > > make it through a full validation run without any crashes. Without those > > patches applied, I would see several crashes resulting from this race > > over the course of a validation run. > > Hm... okay so probably you were indeed running into that bug. I tried > to reproduce the other race (I added a BUG_ON to help detect it), but > wasn't able to reproduce it with my (pretty aggressive) stress test. I > only managed to trigger it by adding a conditional delay in the right > place. So I now know the second bug is really there, though it' seems > to be very unlikely to be hit in practice (might be more likely on > systems with many CPU cores, though). The first bug, OTOH, is > triggered almost instantly by my stress test. > > Unless someone objects, I'll start working on a patch to switch back > to read-write lock for now. If all goes well, I'll send it sometime > next week. > > > > > I'll continue to test with your changes and let you know if I end up > > running into the other race you spotted. > > Thanks, but given the results of my testing it's probably not worth trying :) Those changes have now survived through several validation runs. I can confidently say that they fix the race I was seeing. Tyler > > > > > Tyler > > > > > > > > Basically, before the switch the sidtab swapover worked like this: > > > 1. Start live conversion of new entries. > > > 2. Convert existing entries. > > > [Still only the old sidtab is visible to readers here.] > > > 3. Swap sidtab under write lock. > > > 4. Now only the new sidtab is visible to readers, so the old one can > > > be destroyed. > > > > > > After the switch to RCU, we now have: > > > 1. Start live conversion of new entries. > > > 2. Convert existing entries. > > > 3. RCU-assign the new policy pointer to selinux_state. > > > [!!! Now actually both old and new sidtab may be referenced by > > > readers, since there is no synchronization barrier previously provided > > > by the write lock.] > > > 4. Wait for synchronize_rcu() to return. > > > 5. Now only the new sidtab is visible to readers, so the old one can > > > be destroyed. > > > > > > So the race can happen between 3. and 5., if one thread already sees > > > the new sidtab and adds a new entry there, and a second thread still > > > has the reference to the old sidtab and also tires to add a new entry; > > > live-converting to the new sidtab, which it doesn't expect to change > > > by itself. Unfortunately I failed to realize this when reviewing the > > > patch :/ > > > > > > I think the only two options to fix it are A) switching back to > > > read-write lock (the easy and safe way; undoing the performance > > > benefits of [1]), or B) implementing a safe two-way live conversion of > > > new sidtab entries, so that both tables are kept in sync while they > > > are both available (more complicated and with possible tricky > > > implications of different interpretations of contexts by the two > > > policies). > > > > > > [1] 1b8b31a2e612 ("selinux: convert policy read-write lock to RCU") > > > > > > -- > > > Ondrej Mosnacek > > > Software Engineer, Linux Security - SELinux kernel > > > Red Hat, Inc. > > > > > > > > -- > Ondrej Mosnacek > Software Engineer, Linux Security - SELinux kernel > Red Hat, Inc. > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] Race between policy reload sidtab conversion and live conversion 2021-02-25 16:38 ` Ondrej Mosnacek 2021-02-25 16:45 ` Tyler Hicks @ 2021-02-25 23:27 ` Paul Moore 1 sibling, 0 replies; 16+ messages in thread From: Paul Moore @ 2021-02-25 23:27 UTC (permalink / raw) To: Ondrej Mosnacek Cc: Tyler Hicks, Stephen Smalley, SElinux list, Linux kernel mailing list On Thu, Feb 25, 2021 at 11:38 AM Ondrej Mosnacek <omosnace@redhat.com> wrote: > Unless someone objects, I'll start working on a patch to switch back > to read-write lock for now. If all goes well, I'll send it sometime > next week. Sorry, I was looking at other things and just got to this ... I'm not overly excited about switching back to the read-write lock so quickly, I'd rather we spend some additional time looking into resolving issues with the current RCU code. -- paul moore www.paul-moore.com ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] Race between policy reload sidtab conversion and live conversion 2021-02-24 9:33 ` Ondrej Mosnacek 2021-02-24 14:36 ` Tyler Hicks @ 2021-02-26 1:06 ` Paul Moore 2021-02-26 11:11 ` Ondrej Mosnacek [not found] ` <20210226040542.1137-1-hdanton@sina.com> 1 sibling, 2 replies; 16+ messages in thread From: Paul Moore @ 2021-02-26 1:06 UTC (permalink / raw) To: Ondrej Mosnacek Cc: Tyler Hicks, Stephen Smalley, SElinux list, Linux kernel mailing list On Wed, Feb 24, 2021 at 4:35 AM Ondrej Mosnacek <omosnace@redhat.com> wrote: > After the switch to RCU, we now have: > 1. Start live conversion of new entries. > 2. Convert existing entries. > 3. RCU-assign the new policy pointer to selinux_state. > [!!! Now actually both old and new sidtab may be referenced by > readers, since there is no synchronization barrier previously provided > by the write lock.] > 4. Wait for synchronize_rcu() to return. > 5. Now only the new sidtab is visible to readers, so the old one can > be destroyed. > > So the race can happen between 3. and 5., if one thread already sees > the new sidtab and adds a new entry there, and a second thread still > has the reference to the old sidtab and also tires to add a new entry; > live-converting to the new sidtab, which it doesn't expect to change > by itself. Unfortunately I failed to realize this when reviewing the > patch :/ It is possible I'm not fully understanding the problem and/or missing an important detail - it is rather tricky code, and RCU can be very hard to reason at times - but I think we may be able to solve this with some lock fixes inside sidtab_context_to_sid(). Let me try to explain to see if we are on the same page here ... The problem is when we have two (or more) threads trying to add/convert the same context into a sid; the task with new_sidtab is looking to add a new sidtab entry, while the task with old_sidtab is looking to convert an entry in old_sidtab into a new entry in new_sidtab. Boom. Looking at the code in sidtab_context_to_sid(), when we have two sidtabs that are currently active (old_sidtab->convert pointer is valid) and a task with old_sidtab attempts to add a new entry to both sidtabs it first adds it to the old sidtab then it also adds it to the new sidtab. I believe the problem is that in this case while the task grabs the old_sidtab->lock, it never grabs the new_sidtab->lock which allows it to race with tasks that already see only new_sidtab. I think adding code to sidtab_context_to_sid() which grabs the new_sidtab->lock when adding entries to the new_sidtab *should* solve the problem. Did I miss something important? ;) -- paul moore www.paul-moore.com ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] Race between policy reload sidtab conversion and live conversion 2021-02-26 1:06 ` Paul Moore @ 2021-02-26 11:11 ` Ondrej Mosnacek 2021-02-28 19:21 ` Paul Moore [not found] ` <20210226040542.1137-1-hdanton@sina.com> 1 sibling, 1 reply; 16+ messages in thread From: Ondrej Mosnacek @ 2021-02-26 11:11 UTC (permalink / raw) To: Paul Moore Cc: Tyler Hicks, Stephen Smalley, SElinux list, Linux kernel mailing list On Fri, Feb 26, 2021 at 2:07 AM Paul Moore <paul@paul-moore.com> wrote: > On Wed, Feb 24, 2021 at 4:35 AM Ondrej Mosnacek <omosnace@redhat.com> wrote: > > After the switch to RCU, we now have: > > 1. Start live conversion of new entries. > > 2. Convert existing entries. > > 3. RCU-assign the new policy pointer to selinux_state. > > [!!! Now actually both old and new sidtab may be referenced by > > readers, since there is no synchronization barrier previously provided > > by the write lock.] > > 4. Wait for synchronize_rcu() to return. > > 5. Now only the new sidtab is visible to readers, so the old one can > > be destroyed. > > > > So the race can happen between 3. and 5., if one thread already sees > > the new sidtab and adds a new entry there, and a second thread still > > has the reference to the old sidtab and also tires to add a new entry; > > live-converting to the new sidtab, which it doesn't expect to change > > by itself. Unfortunately I failed to realize this when reviewing the > > patch :/ > > It is possible I'm not fully understanding the problem and/or missing > an important detail - it is rather tricky code, and RCU can be very > hard to reason at times - but I think we may be able to solve this > with some lock fixes inside sidtab_context_to_sid(). Let me try to > explain to see if we are on the same page here ... > > The problem is when we have two (or more) threads trying to > add/convert the same context into a sid; the task with new_sidtab is > looking to add a new sidtab entry, while the task with old_sidtab is > looking to convert an entry in old_sidtab into a new entry in > new_sidtab. Boom. > > Looking at the code in sidtab_context_to_sid(), when we have two > sidtabs that are currently active (old_sidtab->convert pointer is > valid) and a task with old_sidtab attempts to add a new entry to both > sidtabs it first adds it to the old sidtab then it also adds it to the > new sidtab. I believe the problem is that in this case while the task > grabs the old_sidtab->lock, it never grabs the new_sidtab->lock which > allows it to race with tasks that already see only new_sidtab. I > think adding code to sidtab_context_to_sid() which grabs the > new_sidtab->lock when adding entries to the new_sidtab *should* solve > the problem. > > Did I miss something important? ;) Sadly, yes :) Consider this scenario (assuming we fix the locking at sidtab level): If it happens that a new SID (x) is added via the new sidtab and then another one (y) via the old sidtab, to avoid clash of SIDs, we would need to leave a "hole" in the old sidtab for SID x. And this will cause trouble if the thread that has just added SID y, then tries to translate the context string corresponding to SID x (without re-taking the RCU read lock and refreshing the policy pointer). Even if we handle skipping the "holes" in the old sidtab safely, the translation would then end up adding a duplicate SID entry for the context already represented by SID x - which is not a state we want to end up in. This is why I said that to fully fix this, we'd need to have a both-ways live conversion in place. (And that already starts to feel like too much hacking for something that should probably go to stable@...) -- Ondrej Mosnacek Software Engineer, Linux Security - SELinux kernel Red Hat, Inc. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] Race between policy reload sidtab conversion and live conversion 2021-02-26 11:11 ` Ondrej Mosnacek @ 2021-02-28 19:21 ` Paul Moore 2021-03-01 10:35 ` Ondrej Mosnacek 0 siblings, 1 reply; 16+ messages in thread From: Paul Moore @ 2021-02-28 19:21 UTC (permalink / raw) To: Ondrej Mosnacek Cc: Tyler Hicks, Stephen Smalley, SElinux list, Linux kernel mailing list On Fri, Feb 26, 2021 at 6:12 AM Ondrej Mosnacek <omosnace@redhat.com> wrote: > On Fri, Feb 26, 2021 at 2:07 AM Paul Moore <paul@paul-moore.com> wrote: > > On Wed, Feb 24, 2021 at 4:35 AM Ondrej Mosnacek <omosnace@redhat.com> wrote: > > > After the switch to RCU, we now have: > > > 1. Start live conversion of new entries. > > > 2. Convert existing entries. > > > 3. RCU-assign the new policy pointer to selinux_state. > > > [!!! Now actually both old and new sidtab may be referenced by > > > readers, since there is no synchronization barrier previously provided > > > by the write lock.] > > > 4. Wait for synchronize_rcu() to return. > > > 5. Now only the new sidtab is visible to readers, so the old one can > > > be destroyed. > > > > > > So the race can happen between 3. and 5., if one thread already sees > > > the new sidtab and adds a new entry there, and a second thread still > > > has the reference to the old sidtab and also tires to add a new entry; > > > live-converting to the new sidtab, which it doesn't expect to change > > > by itself. Unfortunately I failed to realize this when reviewing the > > > patch :/ > > > > It is possible I'm not fully understanding the problem and/or missing > > an important detail - it is rather tricky code, and RCU can be very > > hard to reason at times - but I think we may be able to solve this > > with some lock fixes inside sidtab_context_to_sid(). Let me try to > > explain to see if we are on the same page here ... > > > > The problem is when we have two (or more) threads trying to > > add/convert the same context into a sid; the task with new_sidtab is > > looking to add a new sidtab entry, while the task with old_sidtab is > > looking to convert an entry in old_sidtab into a new entry in > > new_sidtab. Boom. > > > > Looking at the code in sidtab_context_to_sid(), when we have two > > sidtabs that are currently active (old_sidtab->convert pointer is > > valid) and a task with old_sidtab attempts to add a new entry to both > > sidtabs it first adds it to the old sidtab then it also adds it to the > > new sidtab. I believe the problem is that in this case while the task > > grabs the old_sidtab->lock, it never grabs the new_sidtab->lock which > > allows it to race with tasks that already see only new_sidtab. I > > think adding code to sidtab_context_to_sid() which grabs the > > new_sidtab->lock when adding entries to the new_sidtab *should* solve > > the problem. > > > > Did I miss something important? ;) > > Sadly, yes :) Consider this scenario (assuming we fix the locking at > sidtab level): > > If it happens that a new SID (x) is added via the new sidtab and then > another one (y) via the old sidtab, to avoid clash of SIDs, we would > need to leave a "hole" in the old sidtab for SID x. And this will > cause trouble if the thread that has just added SID y, then tries to > translate the context string corresponding to SID x (without re-taking > the RCU read lock and refreshing the policy pointer). Even if we > handle skipping the "holes" in the old sidtab safely, the translation > would then end up adding a duplicate SID entry for the context already > represented by SID x - which is not a state we want to end up in. Ah, yes, you're right. I was only thinking about the problem of adding an entry to the old sidtab, and not the (much more likely case) of an entry being added to the new sidtab. Bummer. Thinking aloud for a moment - what if we simply refused to add new sidtab entries if the task's sidtab pointer is "old"? Common sense would tell us that this scenario should be very rare at present, and I believe the testing mentioned in this thread adds some weight to that claim. After all, this only affects tasks which entered into their RCU protected session prior to the policy load RCU sync *AND* are attempting to add a new entry to the sidtab. That *has* to be a really low percentage, especially on a system that has been up and running for some time. My gut feeling is this should be safe as well; all of the calling code should have the necessary error handling in place as there are plenty of reasons why we could normally fail to add an entry to the sidtab; memory allocation failures being the most obvious failure point I would suspect. This obvious downside to such an approach is that those operations which do meet this criteria would fail - and we should likely emit an error in this case - but is this failure really worse than any other transient kernel failure, and is attempting to mitigate this failure worth abandoning the RCU approach for the sidtab? -- paul moore www.paul-moore.com ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] Race between policy reload sidtab conversion and live conversion 2021-02-28 19:21 ` Paul Moore @ 2021-03-01 10:35 ` Ondrej Mosnacek 2021-03-01 14:46 ` Paul Moore 0 siblings, 1 reply; 16+ messages in thread From: Ondrej Mosnacek @ 2021-03-01 10:35 UTC (permalink / raw) To: Paul Moore Cc: Tyler Hicks, Stephen Smalley, SElinux list, Linux kernel mailing list On Sun, Feb 28, 2021 at 8:21 PM Paul Moore <paul@paul-moore.com> wrote: > On Fri, Feb 26, 2021 at 6:12 AM Ondrej Mosnacek <omosnace@redhat.com> wrote: > > On Fri, Feb 26, 2021 at 2:07 AM Paul Moore <paul@paul-moore.com> wrote: > > > On Wed, Feb 24, 2021 at 4:35 AM Ondrej Mosnacek <omosnace@redhat.com> wrote: > > > > After the switch to RCU, we now have: > > > > 1. Start live conversion of new entries. > > > > 2. Convert existing entries. > > > > 3. RCU-assign the new policy pointer to selinux_state. > > > > [!!! Now actually both old and new sidtab may be referenced by > > > > readers, since there is no synchronization barrier previously provided > > > > by the write lock.] > > > > 4. Wait for synchronize_rcu() to return. > > > > 5. Now only the new sidtab is visible to readers, so the old one can > > > > be destroyed. > > > > > > > > So the race can happen between 3. and 5., if one thread already sees > > > > the new sidtab and adds a new entry there, and a second thread still > > > > has the reference to the old sidtab and also tires to add a new entry; > > > > live-converting to the new sidtab, which it doesn't expect to change > > > > by itself. Unfortunately I failed to realize this when reviewing the > > > > patch :/ > > > > > > It is possible I'm not fully understanding the problem and/or missing > > > an important detail - it is rather tricky code, and RCU can be very > > > hard to reason at times - but I think we may be able to solve this > > > with some lock fixes inside sidtab_context_to_sid(). Let me try to > > > explain to see if we are on the same page here ... > > > > > > The problem is when we have two (or more) threads trying to > > > add/convert the same context into a sid; the task with new_sidtab is > > > looking to add a new sidtab entry, while the task with old_sidtab is > > > looking to convert an entry in old_sidtab into a new entry in > > > new_sidtab. Boom. > > > > > > Looking at the code in sidtab_context_to_sid(), when we have two > > > sidtabs that are currently active (old_sidtab->convert pointer is > > > valid) and a task with old_sidtab attempts to add a new entry to both > > > sidtabs it first adds it to the old sidtab then it also adds it to the > > > new sidtab. I believe the problem is that in this case while the task > > > grabs the old_sidtab->lock, it never grabs the new_sidtab->lock which > > > allows it to race with tasks that already see only new_sidtab. I > > > think adding code to sidtab_context_to_sid() which grabs the > > > new_sidtab->lock when adding entries to the new_sidtab *should* solve > > > the problem. > > > > > > Did I miss something important? ;) > > > > Sadly, yes :) Consider this scenario (assuming we fix the locking at > > sidtab level): > > > > If it happens that a new SID (x) is added via the new sidtab and then > > another one (y) via the old sidtab, to avoid clash of SIDs, we would > > need to leave a "hole" in the old sidtab for SID x. And this will > > cause trouble if the thread that has just added SID y, then tries to > > translate the context string corresponding to SID x (without re-taking > > the RCU read lock and refreshing the policy pointer). Even if we > > handle skipping the "holes" in the old sidtab safely, the translation > > would then end up adding a duplicate SID entry for the context already > > represented by SID x - which is not a state we want to end up in. > > Ah, yes, you're right. I was only thinking about the problem of > adding an entry to the old sidtab, and not the (much more likely case) > of an entry being added to the new sidtab. Bummer. > > Thinking aloud for a moment - what if we simply refused to add new > sidtab entries if the task's sidtab pointer is "old"? Common sense > would tell us that this scenario should be very rare at present, and I > believe the testing mentioned in this thread adds some weight to that > claim. After all, this only affects tasks which entered into their > RCU protected session prior to the policy load RCU sync *AND* are > attempting to add a new entry to the sidtab. That *has* to be a > really low percentage, especially on a system that has been up and > running for some time. My gut feeling is this should be safe as well; > all of the calling code should have the necessary error handling in > place as there are plenty of reasons why we could normally fail to add > an entry to the sidtab; memory allocation failures being the most > obvious failure point I would suspect. This obvious downside to such > an approach is that those operations which do meet this criteria would > fail - and we should likely emit an error in this case - but is this > failure really worse than any other transient kernel failure, No, I don't like this approach at all. Before the sidtab refactor, it had been done exactly this way - ENOMEM was returned while the sidtab was "frozen" (i.e. while the existing entries were being converted). And this was a real nuisance because things would fail randomly during policy reload. And it's not just unimportant explicit userspace actions that can fail. Any kind of transition can lead to a new SID being created and you'd get things like execve(), mkdir(), ... return -ENOMEM sometimes. (With a low probability, but still...) I wouldn't compare it to a memory allocation failure, which normally starts happening only when the system becomes overloaded. Here the user would *awlays* have some probability of getting this error, and they couldn't do anything about it. > and is > attempting to mitigate this failure worth abandoning the RCU approach > for the sidtab? Perhaps it wasn't clear from what I wrote, but I certainly don't want to abandon it completely. Just to revert to a safe state until we figure out how to do the RCU policy reload safely. The solution with two-way conversion seems doable, it's just not a quick and easy fix. -- Ondrej Mosnacek Software Engineer, Linux Security - SELinux kernel Red Hat, Inc. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] Race between policy reload sidtab conversion and live conversion 2021-03-01 10:35 ` Ondrej Mosnacek @ 2021-03-01 14:46 ` Paul Moore 0 siblings, 0 replies; 16+ messages in thread From: Paul Moore @ 2021-03-01 14:46 UTC (permalink / raw) To: Ondrej Mosnacek Cc: Tyler Hicks, Stephen Smalley, SElinux list, Linux kernel mailing list On Mon, Mar 1, 2021 at 5:36 AM Ondrej Mosnacek <omosnace@redhat.com> wrote: > On Sun, Feb 28, 2021 at 8:21 PM Paul Moore <paul@paul-moore.com> wrote: > > On Fri, Feb 26, 2021 at 6:12 AM Ondrej Mosnacek <omosnace@redhat.com> wrote: > > > On Fri, Feb 26, 2021 at 2:07 AM Paul Moore <paul@paul-moore.com> wrote: > > > > On Wed, Feb 24, 2021 at 4:35 AM Ondrej Mosnacek <omosnace@redhat.com> wrote: ... > > Ah, yes, you're right. I was only thinking about the problem of > > adding an entry to the old sidtab, and not the (much more likely case) > > of an entry being added to the new sidtab. Bummer. > > > > Thinking aloud for a moment - what if we simply refused to add new > > sidtab entries if the task's sidtab pointer is "old"? Common sense > > would tell us that this scenario should be very rare at present, and I > > believe the testing mentioned in this thread adds some weight to that > > claim. After all, this only affects tasks which entered into their > > RCU protected session prior to the policy load RCU sync *AND* are > > attempting to add a new entry to the sidtab. That *has* to be a > > really low percentage, especially on a system that has been up and > > running for some time. My gut feeling is this should be safe as well; > > all of the calling code should have the necessary error handling in > > place as there are plenty of reasons why we could normally fail to add > > an entry to the sidtab; memory allocation failures being the most > > obvious failure point I would suspect. This obvious downside to such > > an approach is that those operations which do meet this criteria would > > fail - and we should likely emit an error in this case - but is this > > failure really worse than any other transient kernel failure, > > No, I don't like this approach at all. Before the sidtab refactor, it > had been done exactly this way ... I recognize I probably haven't made my feelings about reverts clear, or if I have, I haven't done so recently. Let me fix that now: I *hate* them. Further I hate reverts with a deep, passionate hatred that I reserve for very few things. Maybe we have to revert this change, even though I *hate* reverts they do remain an option; you just need to be 99% sure you've exhausted all the other options first. > Perhaps it wasn't clear from what I wrote, but I certainly don't want > to abandon it completely. Just to revert to a safe state until we > figure out how to do the RCU policy reload safely. The solution with > two-way conversion seems doable, it's just not a quick and easy fix. I suggest pursuing this before the revert to see what it looks like and we can discuss it further during review. -- paul moore www.paul-moore.com ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <20210226040542.1137-1-hdanton@sina.com>]
* Re: [BUG] Race between policy reload sidtab conversion and live conversion [not found] ` <20210226040542.1137-1-hdanton@sina.com> @ 2021-02-26 11:19 ` Ondrej Mosnacek [not found] ` <20210227023524.15844-1-hdanton@sina.com> 0 siblings, 1 reply; 16+ messages in thread From: Ondrej Mosnacek @ 2021-02-26 11:19 UTC (permalink / raw) To: Hillf Danton; +Cc: Paul Moore, Tyler Hicks, Stephen Smalley, SElinux list, LKML On Fri, Feb 26, 2021 at 5:08 AM Hillf Danton <hdanton@sina.com> wrote: > On Thu, 25 Feb 2021 20:06:45 -0500 Paul Moore wrote: > > On Wed, Feb 24, 2021 at 4:35 AM Ondrej Mosnacek <omosnace@redhat.com> wrote: > > > After the switch to RCU, we now have: > > > 1. Start live conversion of new entries. > > > 2. Convert existing entries. > > > 3. RCU-assign the new policy pointer to selinux_state. > > > [!!! Now actually both old and new sidtab may be referenced by > > > readers, since there is no synchronization barrier previously provided > > > by the write lock.] > > > 4. Wait for synchronize_rcu() to return. > > > 5. Now only the new sidtab is visible to readers, so the old one can > > > be destroyed. > > > > > > So the race can happen between 3. and 5., if one thread already sees > > > the new sidtab and adds a new entry there, and a second thread still > > > has the reference to the old sidtab and also tires to add a new entry; > > > live-converting to the new sidtab, which it doesn't expect to change > > > by itself. Unfortunately I failed to realize this when reviewing the > > > patch :/ > > > > It is possible I'm not fully understanding the problem and/or missing > > an important detail - it is rather tricky code, and RCU can be very > > hard to reason at times - but I think we may be able to solve this > > with some lock fixes inside sidtab_context_to_sid(). Let me try to > > explain to see if we are on the same page here ... > > > > The problem is when we have two (or more) threads trying to > > add/convert the same context into a sid; the task with new_sidtab is > > looking to add a new sidtab entry, while the task with old_sidtab is > > looking to convert an entry in old_sidtab into a new entry in > > new_sidtab. Boom. > > > > Looking at the code in sidtab_context_to_sid(), when we have two > > sidtabs that are currently active (old_sidtab->convert pointer is > > valid) and a task with old_sidtab attempts to add a new entry to both > > sidtabs it first adds it to the old sidtab then it also adds it to the > > new sidtab. I believe the problem is that in this case while the task > > grabs the old_sidtab->lock, it never grabs the new_sidtab->lock which > > allows it to race with tasks that already see only new_sidtab. I > > think adding code to sidtab_context_to_sid() which grabs the > > new_sidtab->lock when adding entries to the new_sidtab *should* solve > > the problem. > > > > Did I miss something important? ;) > > If the convert pointer can be derefered without lock, we can opt to > convert context after building sidtab with the risk of AB BA deadlock > cut. Below is the minimum change I can think of along your direction. We could fix this a bit more easily by just having a shared spinlock for both (well, *all*) sidtabs. Yes, we'd need to have it all the way up in selinux_state and pass it through to sidtab_init(), but IMHO that's less bad than trying to get it right with two locks. -- Ondrej Mosnacek Software Engineer, Linux Security - SELinux kernel Red Hat, Inc. ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <20210227023524.15844-1-hdanton@sina.com>]
* Re: [BUG] Race between policy reload sidtab conversion and live conversion [not found] ` <20210227023524.15844-1-hdanton@sina.com> @ 2021-03-01 14:35 ` Ondrej Mosnacek 0 siblings, 0 replies; 16+ messages in thread From: Ondrej Mosnacek @ 2021-03-01 14:35 UTC (permalink / raw) To: Hillf Danton; +Cc: Paul Moore, Tyler Hicks, Stephen Smalley, SElinux list, LKML On Sat, Feb 27, 2021 at 3:35 AM Hillf Danton <hdanton@sina.com> wrote: > On Fri, 26 Feb 2021 12:19:35 +0100 Ondrej Mosnacek wrote: > > On Fri, Feb 26, 2021 at 5:08 AM Hillf Danton <hdanton@sina.com> wrote: > > > On Thu, 25 Feb 2021 20:06:45 -0500 Paul Moore wrote: > > > > On Wed, Feb 24, 2021 at 4:35 AM Ondrej Mosnacek <omosnace@redhat.com> wrote: > > > > > After the switch to RCU, we now have: > > > > > 1. Start live conversion of new entries. > > > > > 2. Convert existing entries. > > > > > 3. RCU-assign the new policy pointer to selinux_state. > > > > > [!!! Now actually both old and new sidtab may be referenced by > > > > > readers, since there is no synchronization barrier previously provided > > > > > by the write lock.] > > > > > 4. Wait for synchronize_rcu() to return. > > > > > 5. Now only the new sidtab is visible to readers, so the old one can > > > > > be destroyed. > > > > > > > > > > So the race can happen between 3. and 5., if one thread already sees > > > > > the new sidtab and adds a new entry there, and a second thread still > > > > > has the reference to the old sidtab and also tires to add a new entry; > > > > > live-converting to the new sidtab, which it doesn't expect to change > > > > > by itself. Unfortunately I failed to realize this when reviewing the > > > > > patch :/ > > > > > > > > It is possible I'm not fully understanding the problem and/or missing > > > > an important detail - it is rather tricky code, and RCU can be very > > > > hard to reason at times - but I think we may be able to solve this > > > > with some lock fixes inside sidtab_context_to_sid(). Let me try to > > > > explain to see if we are on the same page here ... > > > > > > > > The problem is when we have two (or more) threads trying to > > > > add/convert the same context into a sid; the task with new_sidtab is > > > > looking to add a new sidtab entry, while the task with old_sidtab is > > > > looking to convert an entry in old_sidtab into a new entry in > > > > new_sidtab. Boom. > > > > > > > > Looking at the code in sidtab_context_to_sid(), when we have two > > > > sidtabs that are currently active (old_sidtab->convert pointer is > > > > valid) and a task with old_sidtab attempts to add a new entry to both > > > > sidtabs it first adds it to the old sidtab then it also adds it to the > > > > new sidtab. I believe the problem is that in this case while the task > > > > grabs the old_sidtab->lock, it never grabs the new_sidtab->lock which > > > > allows it to race with tasks that already see only new_sidtab. I > > > > think adding code to sidtab_context_to_sid() which grabs the > > > > new_sidtab->lock when adding entries to the new_sidtab *should* solve > > > > the problem. > > > > > > > > Did I miss something important? ;) > > > > > > If the convert pointer can be derefered without lock, we can opt to > > > convert context after building sidtab with the risk of AB BA deadlock > > > cut. Below is the minimum change I can think of along your direction. > > > > We could fix this a bit more easily by just having a shared spinlock > > for both (well, *all*) sidtabs. Yes, we'd need to have it all the way > > Looking forward to reading how to add another BKL, This is not even remotely comparable to the Big Kernel Lock... 99.9% of the time, there will be just one sidtab, otherwise there will be just two for a short time during the policy reload, which need to be updated in sync anyway. > > > up in selinux_state and pass it through to sidtab_init(), but IMHO > > that's less bad than trying to get it right with two locks. > > instead of a nearly pure code churn in order to walk around deadlock. I wish it were that simple, but it won't work this way. Consider a scenario like this: 1. Thread A acquires the old policy pointer, encounters a new context, adds it to the old sidtab as X, but doesn't start adding it to the new one yet. 2. Thread B acquires the new policy pointer, encounters a different new context and adds it to the new sidtab as X. [Now you end up with two different contexts assigned to the same SID, depending on the policy] 3. Now thread A continues to add the context to the new sidtab, but since it reuses the count variable from before, it overwrites the entry for X. (And even if you fixed that, re-searched the new sidtab, and added the new context to the end, you'd end up with the context having different SIDs under the old vs. new policy, which is again a messy situation.) You see, moving the sidtab code around can get you in trouble and that's why I'm advocating for using a common lock - it's much easier to prove that such change doesn't create new bugs. And as I said above, merging the locks won't really have performance consequences, so I still believe it to be the better choice here. -- Ondrej Mosnacek Software Engineer, Linux Security - SELinux kernel Red Hat, Inc. ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2021-03-01 14:48 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-02-23 21:43 [BUG] Race between policy reload sidtab conversion and live conversion Tyler Hicks 2021-02-23 21:50 ` Tyler Hicks 2021-02-23 22:36 ` Tyler Hicks 2021-02-24 0:02 ` Paul Moore 2021-02-24 9:33 ` Ondrej Mosnacek 2021-02-24 14:36 ` Tyler Hicks 2021-02-25 16:38 ` Ondrej Mosnacek 2021-02-25 16:45 ` Tyler Hicks 2021-02-25 23:27 ` Paul Moore 2021-02-26 1:06 ` Paul Moore 2021-02-26 11:11 ` Ondrej Mosnacek 2021-02-28 19:21 ` Paul Moore 2021-03-01 10:35 ` Ondrej Mosnacek 2021-03-01 14:46 ` Paul Moore [not found] ` <20210226040542.1137-1-hdanton@sina.com> 2021-02-26 11:19 ` Ondrej Mosnacek [not found] ` <20210227023524.15844-1-hdanton@sina.com> 2021-03-01 14:35 ` Ondrej Mosnacek
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).