Hello together, On 11/5/21 22:34, Eric W. Biederman wrote: > I have to dash so this is short. > > This is what I am thinking this change should look like. > > I am not certain this is truly reviewable as a single change, so I will > break it into a couple of smaller ones next time I get the chance. I think we should concentrate to check the commit from Alexander. What I did is to write two additional stress test apps - and now I'm able to trigger the use-after-free bug. It is much simpler, the exclusion of exit_shm() and IPC_RMID didn't work - regardless if your approach or the approach from Alexander/myself is used. > > +static inline void shm_clist_del(struct shmid_kernel *shp) > +{ > + struct task_struct *creator; > + > + rcu_read_lock(); > + creator = rcu_dereference(shp->shm_creator); > + if (creator) { > + task_lock(creator); > + list_del(&shp->shm_clist); > + task_unlock(creator); > + } > + rcu_read_unlock(); > +} > + shm_clist_del() only synchronizes against exit_shm() when shm_creator is not NULL. > + list_del(&shp->shm_clist); > + rcu_assign_pointer(shp->shm_creator, NULL); > + We set shm_creator to NULL -> no more synchronization. Now IPC_RMID can run in parallel - regardless if we test for list_empty() or shm_creator. > + > + /* Guarantee shp lives after task_lock is dropped */ > + ipc_getref(&shp->shm_perm); > + task_lock() doesn't help: As soon as shm_creator is set to NULL, IPC_RMID won't acquire task_lock() anymore. Thus shp can disappear before we arrive at this ipc_getref. [Yes, I think I have introduced this bug. ] Corrected version attached. I'll reboot and retest the patch, then I would send it to akpm as replacement for current patch in mmotm. --     Manfred