I don't think fork() would work with userspace where all buffers are shared. It certainly doesn't work now. The driver needs to be notified that a buffer or texture is shared to ensure data coherency between processes, and the driver must execute decompression and other render passes when a buffer or texture is being shared for the first time. Those aren't called when fork() is called. Marek On Mon, Jan 17, 2022 at 9:34 AM Felix Kuehling wrote: > Am 2022-01-17 um 9:21 a.m. schrieb Christian König: > > Am 17.01.22 um 15:17 schrieb Felix Kuehling: > >> Am 2022-01-17 um 6:44 a.m. schrieb Christian König: > >>> Am 14.01.22 um 18:40 schrieb Felix Kuehling: > >>>> Am 2022-01-14 um 12:26 p.m. schrieb Christian König: > >>>>> Am 14.01.22 um 17:44 schrieb Daniel Vetter: > >>>>>> Top post because I tried to catch up on the entire discussion here. > >>>>>> > >>>>>> So fundamentally I'm not opposed to just close this fork() hole > >>>>>> once and > >>>>>> for all. The thing that worries me from a upstream/platform pov is > >>>>>> really > >>>>>> only if we don't do it consistently across all drivers. > >>>>>> > >>>>>> So maybe as an idea: > >>>>>> - Do the original patch, but not just for ttm but all gem rendernode > >>>>>> drivers at least (or maybe even all gem drivers, no idea), with > >>>>>> the > >>>>>> below discussion cleaned up as justification. > >>>>> I know of at least one use case which this will break. > >>>>> > >>>>> A couple of years back we had a discussion on the Mesa mailing list > >>>>> because (IIRC) Marek introduced a background thread to push command > >>>>> submissions to the kernel. > >>>>> > >>>>> That broke because some compositor used to initialize OpenGL and then > >>>>> do a fork(). This indeed worked previously (no GPUVM at that time), > >>>>> but with the addition of the backround thread obviously broke. > >>>>> > >>>>> The conclusion back then was that the compositor is broken and needs > >>>>> fixing, but it still essentially means that there could be people out > >>>>> there with really old userspace where this setting would just break > >>>>> the desktop. > >>>>> > >>>>> I'm not really against that change either, but at least in theory we > >>>>> could make fork() work perfectly fine even with VMs and background > >>>>> threads. > >>>> You may regret this if you ever try to build a shared virtual address > >>>> space between GPU and CPU. Then you have two processes (parent and > >>>> child) sharing the same render context and GPU VM address space. > >>>> But the > >>>> CPU address spaces are different. You can't maintain consistent shared > >>>> virtual address spaces for both processes when the GPU address > >>>> space is > >>>> shared between them. > >>> That's actually not much of a problem. > >>> > >>> All you need to do is to use pthread_atfork() and do the appropriate > >>> action in parent/child to clean up your context: > >>> https://man7.org/linux/man-pages/man3/pthread_atfork.3.html > >> Thunk already does that. However, it's not foolproof. pthread_atfork > >> hanlders aren't called when the process is forked with a clone call. > > > > Yeah, but that's perfectly intentional. clone() is usually used to > > create threads. > > Clone can be used to create new processes. Maybe not the common use today. > > > > > >>> The rest is just to make sure that all shared and all private data are > >>> kept separate all the time. Sharing virtual memory is already done for > >>> decades this way, it's just that nobody ever did it with a statefull > >>> device like GPUs. > >> My concern is not with sharing or not sharing data. It's with sharing > >> the address space itself. If you share the render node, you share GPU > >> virtual address space. However CPU address space is not shared between > >> parent and child. That's a fundamental mismatch between the CPU world > >> and current GPU driver implementation. > > > > Correct, but even that is easily solvable. As I said before you can > > hang this state on a VMA and let it be cloned together with the CPU > > address space. > > I'm not following. The address space I'm talking about is struct > amdgpu_vm. It's associated with the render node file descriptor. > Inheriting and using that file descriptor in the child inherits the > amdgpu_vm. I don't see how you can hang that state on any one VMA. > > To be consistent with the CPU, you'd need to clone the GPU address space > (struct amdgpu_vm) in the child process. That means you need a new > render node file descriptor that imports all the BOs from the parent > address space. It's a bunch of extra work to fork a process, that you're > proposing to immediately undo with an atfork handler. So I really don't > see the point. > > Regards, > Felix > > > > > > Since VMAs are informed about their cloning (in opposite to file > > descriptors) it's trivial to even just clone kernel data on first access. > > > > Regards, > > Christian. > > > >> > >> Regards, > >> Felix > >> > >> > >>> Regards, > >>> Christian. > >>> > >>>> Regards, > >>>> Felix > >>>> > > >