On Sun, Feb 03, 2019 at 03:51:53PM +0200, Amir Goldstein wrote: > OK, what I don't understand and requires debugging is that the print of > (realfile, IS_ERR(realfile) ? 0 : realfile->f_flags) suggests that realfile > is not an error value and realfile->f_flags are 0. Just got back to debugging this properly. I think you're confusing the same thing as I ded when first looking at the code, because realfile actually _is_ an error in this case, so the output is correct (I personally probably also got confused because of the realinode/realfile variable names). So after debugging this further (and totally digging in wrong places at first) I found that the actual problem here is the O_NOATIME flag that is passed to the underlying file system. If you look in fs/namei.c in function may_open(), there is a check for inode_owner_or_capable(). Being able to read a file despite being the owner but having read permissions is pefectly fine, but due to the fact that O_NOATIME is passed, the open() fails. Now in normal situations where the overlayfs is mounted as root, this shouldn't be a problem, but as soon as you have a networked file system, things go bad. That's what happened in our case, where we have a 9p file system mounted in a guest VM and a lowerdir of overlayfs on top of that. If the file owner on the host is the same as the current uid of qemu process, the open() works correctly. However if it's not the case, it will fail with EPERM on the host side (even though you have read access). The attached patch simply removes the O_NOATIME flag, which fixes the issue. I originally thought about adding a condition on whether to add the flag, but I only see two options here, which IMHO are bad in their own rights: * Using inode_owner_or_capable() to check whether to add O_NOATIME, which has the downside that it will not work with networked file systems where you map different users (I've tested this already with a different patch[1]). * Check for failure of open_with_fake_path() and retry without O_NOATIME, which *could* be an option, but I think that might come with a performance penalty. Actually, a third option would be to just ignore O_NOATIME in fs/namei.c instead of returning -EPERM, but I think that could open up a whole range of other bugs. In summary, I think just removing O_NOATIME IMHO is the most sensible option, because it doesn't cause problems with network filesystems and also leaves the atime/noatime decision to the administrator of the corresponding system. Or is there something that I've missed where one is in dire need of O_NOATIME? a! -- aszlig Universal dilettante