On 9/10/20 3:01 PM, Jens Axboe wrote: > On 9/10/20 12:18 PM, Jens Axboe wrote: >> On 9/10/20 7:11 AM, Jens Axboe wrote: >>> On 9/10/20 6:37 AM, Pavel Begunkov wrote: >>>> On 09/09/2020 19:07, Jens Axboe wrote: >>>>> On 9/9/20 9:48 AM, Pavel Begunkov wrote: >>>>>> On 09/09/2020 16:10, Jens Axboe wrote: >>>>>>> On 9/9/20 1:09 AM, Pavel Begunkov wrote: >>>>>>>> On 09/09/2020 01:54, Jens Axboe wrote: >>>>>>>>> On 9/8/20 3:22 PM, Jens Axboe wrote: >>>>>>>>>> On 9/8/20 2:58 PM, Pavel Begunkov wrote: >>>>>>>>>>> On 08/09/2020 20:48, Jens Axboe wrote: >>>>>>>>>>>> Fd instantiating commands like IORING_OP_ACCEPT now work with SQPOLL, but >>>>>>>>>>>> we have an error in grabbing that if IOSQE_ASYNC is set. Ensure we assign >>>>>>>>>>>> the ring fd/file appropriately so we can defer grab them. >>>>>>>>>>> >>>>>>>>>>> IIRC, for fcheck() in io_grab_files() to work it should be under fdget(), >>>>>>>>>>> that isn't the case with SQPOLL threads. Am I mistaken? >>>>>>>>>>> >>>>>>>>>>> And it looks strange that the following snippet will effectively disable >>>>>>>>>>> such requests. >>>>>>>>>>> >>>>>>>>>>> fd = dup(ring_fd) >>>>>>>>>>> close(ring_fd) >>>>>>>>>>> ring_fd = fd >>>>>>>>>> >>>>>>>>>> Not disagreeing with that, I think my initial posting made it clear >>>>>>>>>> it was a hack. Just piled it in there for easier testing in terms >>>>>>>>>> of functionality. >>>>>>>>>> >>>>>>>>>> But the next question is how to do this right...> >>>>>>>>> Looking at this a bit more, and I don't necessarily think there's a >>>>>>>>> better option. If you dup+close, then it just won't work. We have no >>>>>>>>> way of knowing if the 'fd' changed, but we can detect if it was closed >>>>>>>>> and then we'll end up just EBADF'ing the requests. >>>>>>>>> >>>>>>>>> So right now the answer is that we can support this just fine with >>>>>>>>> SQPOLL, but you better not dup and close the original fd. Which is not >>>>>>>>> ideal, but better than NOT being able to support it. >>>>>>>>> >>>>>>>>> Only other option I see is to to provide an io_uring_register() >>>>>>>>> command to update the fd/file associated with it. Which may be useful, >>>>>>>>> it allows a process to indeed to this, if it absolutely has to. >>>>>>>> >>>>>>>> Let's put aside such dirty hacks, at least until someone actually >>>>>>>> needs it. Ideally, for many reasons I'd prefer to get rid of >>>>>>> >>>>>>> BUt it is actually needed, otherwise we're even more in a limbo state of >>>>>>> "SQPOLL works for most things now, just not all". And this isn't that >>>>>>> hard to make right - on the flush() side, we just need to park/stall the >>>>>> >>>>>> I understand that it isn't hard, but I just don't want to expose it to >>>>>> the userspace, a) because it's a userspace API, so couldn't probably be >>>>>> killed in the future, b) works around kernel's problems, and so >>>>>> shouldn't really be exposed to the userspace in normal circumstances. >>>>>> >>>>>> And it's not generic enough because of a possible "many fds -> single >>>>>> file" mapping, and there will be a lot of questions and problems. >>>>>> >>>>>> e.g. if a process shares a io_uring with another process, then >>>>>> dup()+close() would require not only this hook but also additional >>>>>> inter-process synchronisation. And so on. >>>>> >>>>> I think you're blowing this out of proportion. Just to restate the >>>> >>>> I just think that if there is a potentially cleaner solution without >>>> involving userspace, we should try to look for it first, even if it >>>> would take more time. That was the point. >>> >>> Regardless of whether or not we can eliminate that need, at least it'll >>> be a relaxing of the restriction, not an increase of it. It'll never >>> hurt to do an extra system call for the case where you're swapping fds. >>> I do get your point, I just don't think it's a big deal. >> >> BTW, I don't see how we can ever get rid of a need to enter the kernel, >> we'd need some chance at grabbing the updated ->files, for instance. >> Might be possible to hold a reference to the task and grab it from >> there, though feels a bit iffy to hold a task reference from the ring on >> the task that holds a reference to the ring. Haven't looked too close, >> should work though as this won't hold a file/files reference, it's just >> a freeing reference. > > Sort of half assed attempt... > > Idea is to assign a ->files sequence before we grab files, and then > compare with the current one once we need to use the files. If they > mismatch, we -ECANCELED the request. > > For SQPOLL, don't grab ->files upfront, grab a reference to the task > instead. Use the task reference to assign files when we need it. > > Adding Jann to help poke holes in this scheme. I'd be surprised if it's > solid as-is, but hopefully we can build on this idea and get rid of the > fcheck(). Split it into two, to make it easier to reason about. Added a few comments, etc. -- Jens Axboe