---- 在 星期二, 2021-04-13 16:47:53 Miklos Szeredi 撰写 ---- > On Tue, Apr 13, 2021 at 5:26 AM Chengguang Xu wrote: > > > > ---- 在 星期四, 2021-04-08 23:03:39 Miklos Szeredi 撰写 ---- > > > On Thu, Apr 8, 2021 at 1:40 PM Chengguang Xu wrote: > > > > > > > > ---- 在 星期四, 2021-04-08 19:29:55 Miklos Szeredi 撰写 ---- > > > > > On Thu, Apr 8, 2021 at 1:28 PM Chengguang Xu wrote: > > > > > > > > > > > > ---- 在 星期四, 2021-04-08 19:20:42 Chengguang Xu 撰写 ---- > > > > > > > In overlayfs copy-up, if open flag has O_TRUNC then upper > > > > > > > file will truncate to zero size, in this case we should check > > > > > > > VM_DENYWRITE mappings to keep compatibility with other filesystems. > > > > > > > > > > Can you provide a test case for the bug that this is fixing? > > > > > > > > > > > > > Execute binary file(keep running until open) in overlayfs which only has lower && open the binary file with flag O_RDWR|O_TRUNC > > > > > > > > Expected result: open fail with -ETXTBSY > > > > > > > > Actual result: open success > > > > > > Worse, it's possible to get a "Bus error" with just execute and write > > > on an overlayfs file, which i_writecount is supposed to protect. > > > > > > The reason is that the put_write_access() call in __vma_link_file() > > > assumes an already negative writecount, but because of the vm_file > > > shuffle in ovl_mmap() that's not guaranteed. There's even a comment > > > about exactly this situation in mmap(): > > > > > > /* ->mmap() can change vma->vm_file, but must guarantee that > > > * vma_link() below can deny write-access if VM_DENYWRITE is set > > > * and map writably if VM_SHARED is set. This usually means the > > > * new file must not have been exposed to user-space, yet. > > > */ > > > > > > The attached patch fixes this, but not your original bug. > > > > > > That could be addressed by checking the writecount on *both* lower and > > > upper for open for write/truncate. Note: this could be checked before > > > copy-up, but that's not reliable alone, because the copy up could > > > happen due to meta-data update, for example, and then the > > > open/truncate wouldn't trigger the writecount check. > > > > > > Something like the second attached patch? > > > > > > > Yeah, I noticed that too just after posted my previous patch. > > However, rethink these two cases, in practice we share lower layers > > in most use cases especially in container use case. So if we check > > VM_DENYWRITE of lower file strictly, it may cause interferes between > > container instances. Maybe only checking upper file will be better > > option? > > Yes. > > My patch to fix the SIGBUS is also incomplete as there's still a race > window between releasing the temporary writecount and the __vma_link() > that acquires the final count. This requires major surgery to fix > properly :( > Hi Miklos, How about to fix something like attached patch? Thanks, Chengguang