* Problems with kernel mmap (failing tst-mmap-eofsync in glibc on parisc) @ 2003-08-22 14:40 James Bottomley 2003-08-22 16:14 ` David S. Miller 0 siblings, 1 reply; 30+ messages in thread From: James Bottomley @ 2003-08-22 14:40 UTC (permalink / raw) To: Linux Kernel, PARISC list; +Cc: davem, drepper [-- Attachment #1: Type: text/plain, Size: 1298 bytes --] This test essentially opens a file (via open(2)), writes something, opens it via a mmaped file object *read only* (via fopen(...,"rm)) reads what was writtent, writes some more and reads it via the mmaped file object. This last read fails to get the data on parisc. The problem is that our CPU cache is virtually indexed, and the page the write is storing the data to (in the buffer cache) and the page it is mmapped to have the same physical, but different virtual addresses. We need the write() to trigger a cache update via flush_dcache_page to get the virtually indexed cache in sync. The reason this doesn't happen is because the mapping is not on the mmap_shared list that flush_dcache_page() updates. And the reason it's not on the correct list is because there's a check in mm/mmap.c:do_mmap_pgoff() that drops the VM_SHARED flag on the mapping if the file wasn't opened for writing (about line 541). Semantically, it seems that whether the mmaping sees a write or not on a different descriptor shouldn't depend on whether the underlying file was opened read only or read write, so I think the glibc test is correct, and we should keep the VM_SHARED flag even if the underlying file was opened read only. The patch is attached (and makes the test pass on parisc). Comments? James [-- Attachment #2: tmp.diff --] [-- Type: text/plain, Size: 363 bytes --] ===== mm/mmap.c 1.89 vs edited ===== --- 1.89/mm/mmap.c Thu Jul 10 21:46:52 2003 +++ edited/mm/mmap.c Fri Aug 22 09:36:32 2003 @@ -539,7 +539,7 @@ vm_flags |= VM_SHARED | VM_MAYSHARE; if (!(file->f_mode & FMODE_WRITE)) - vm_flags &= ~(VM_MAYWRITE | VM_SHARED); + vm_flags &= ~VM_MAYWRITE; /* fall through */ case MAP_PRIVATE: ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Problems with kernel mmap (failing tst-mmap-eofsync in glibc on parisc) 2003-08-22 14:40 Problems with kernel mmap (failing tst-mmap-eofsync in glibc on parisc) James Bottomley @ 2003-08-22 16:14 ` David S. Miller 2003-08-22 16:34 ` [parisc-linux] " Matthew Wilcox 0 siblings, 1 reply; 30+ messages in thread From: David S. Miller @ 2003-08-22 16:14 UTC (permalink / raw) To: James Bottomley; +Cc: linux-kernel, parisc-linux, drepper On 22 Aug 2003 09:40:37 -0500 James Bottomley <James.Bottomley@SteelEye.com> wrote: > This test essentially opens a file (via open(2)), writes something, > opens it via a mmaped file object *read only* (via fopen(...,"rm)) reads > what was writtent, writes some more and reads it via the mmaped file > object. > > This last read fails to get the data on parisc. The problem is that our > CPU cache is virtually indexed, and the page the write is storing the > data to (in the buffer cache) and the page it is mmapped to have the > same physical, but different virtual addresses. We need the write() to > trigger a cache update via flush_dcache_page to get the virtually > indexed cache in sync. > > The reason this doesn't happen is because the mapping is not on the > mmap_shared list that flush_dcache_page() updates. flush_dcache_page() checks both the shared and non-shared mmap lists, so if it is on _either_ list it is flushed. It does not check only the shared list. The VM_SHARED change you are proposing is definitely wrong. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [parisc-linux] Re: Problems with kernel mmap (failing tst-mmap-eofsync in glibc on parisc) 2003-08-22 16:14 ` David S. Miller @ 2003-08-22 16:34 ` Matthew Wilcox 2003-08-22 16:39 ` David S. Miller 2003-08-22 16:42 ` Russell King 0 siblings, 2 replies; 30+ messages in thread From: Matthew Wilcox @ 2003-08-22 16:34 UTC (permalink / raw) To: David S. Miller; +Cc: James Bottomley, linux-kernel, parisc-linux, drepper On Fri, Aug 22, 2003 at 09:14:47AM -0700, David S. Miller wrote: > On 22 Aug 2003 09:40:37 -0500 > James Bottomley <James.Bottomley@SteelEye.com> wrote: > > The reason this doesn't happen is because the mapping is not on the > > mmap_shared list that flush_dcache_page() updates. > > flush_dcache_page() checks both the shared and non-shared mmap lists, > so if it is on _either_ list it is flushed. It does not check only > the shared list. Gah, that's going to get really inefficient. I still think we want to split flush_dcache_page() into two operations -- flush_dcache_user() and flush_dcache_kernel(). flush_dcache_user() would flush this specific user mapping back to ram and flush_dcache_kernel() would flush the kernel mapping. Obviously we'd still want to have flush_dcache_page() as there are instances when you want to flush all user mappings and the kernel mapping back to ram. > The VM_SHARED change you are proposing is definitely wrong. Why is it wrong? Why should whether-or-not a mapping is read-only affect whether it's mapped shared? I can't see anything in SuS v3 that suggests we should do this. -- "It's not Hollywood. War is real, war is primarily not about defeat or victory, it is about death. I've seen thousands and thousands of dead bodies. Do you think I want to have an academic debate on this subject?" -- Robert Fisk ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [parisc-linux] Re: Problems with kernel mmap (failing tst-mmap-eofsync in glibc on parisc) 2003-08-22 16:34 ` [parisc-linux] " Matthew Wilcox @ 2003-08-22 16:39 ` David S. Miller 2003-08-22 17:41 ` Matthew Wilcox 2003-08-22 16:42 ` Russell King 1 sibling, 1 reply; 30+ messages in thread From: David S. Miller @ 2003-08-22 16:39 UTC (permalink / raw) To: Matthew Wilcox; +Cc: James.Bottomley, linux-kernel, parisc-linux, drepper On Fri, 22 Aug 2003 17:34:29 +0100 Matthew Wilcox <willy@debian.org> wrote: > On Fri, Aug 22, 2003 at 09:14:47AM -0700, David S. Miller wrote: > > On 22 Aug 2003 09:40:37 -0500 > > flush_dcache_page() checks both the shared and non-shared mmap lists, > > so if it is on _either_ list it is flushed. It does not check only > > the shared list. > > Gah, that's going to get really inefficient. I still think we want to > split flush_dcache_page() into two operations -- flush_dcache_user() and > flush_dcache_kernel(). flush_dcache_user() would flush this specific > user mapping back to ram and flush_dcache_kernel() would flush the > kernel mapping. Obviously we'd still want to have flush_dcache_page() > as there are instances when you want to flush all user mappings and the > kernel mapping back to ram. flush_dcache_page() works only on kernel pages. It is defined to execute when the kernel executes store instructions into a page. Therefore splitting it into a "user" part makes absolutely no sense. > > The VM_SHARED change you are proposing is definitely wrong. > > Why is it wrong? Why should whether-or-not a mapping is read-only affect > whether it's mapped shared? I can't see anything in SuS v3 that suggests > we should do this. MAP_SHARED has no meaning if the mapping isn't writable. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [parisc-linux] Re: Problems with kernel mmap (failing tst-mmap-eofsync in glibc on parisc) 2003-08-22 16:39 ` David S. Miller @ 2003-08-22 17:41 ` Matthew Wilcox 2003-08-22 17:36 ` David S. Miller 0 siblings, 1 reply; 30+ messages in thread From: Matthew Wilcox @ 2003-08-22 17:41 UTC (permalink / raw) To: David S. Miller Cc: Matthew Wilcox, James.Bottomley, linux-kernel, parisc-linux, drepper On Fri, Aug 22, 2003 at 09:39:00AM -0700, David S. Miller wrote: > flush_dcache_page() works only on kernel pages. > > It is defined to execute when the kernel executes store instructions > into a page. > > Therefore splitting it into a "user" part makes absolutely no > sense. Uhm. So what happens when the user has stored into the page and now the kernel wants to read from it? There's still data in the cache for the user mapping that's non-coherent with the kernel mapping. > > > The VM_SHARED change you are proposing is definitely wrong. > > > > Why is it wrong? Why should whether-or-not a mapping is read-only affect > > whether it's mapped shared? I can't see anything in SuS v3 that suggests > > we should do this. > > MAP_SHARED has no meaning if the mapping isn't writable. Sure it does. It affects whether other writes to that page show up in the shared read-only mapping. -- "It's not Hollywood. War is real, war is primarily not about defeat or victory, it is about death. I've seen thousands and thousands of dead bodies. Do you think I want to have an academic debate on this subject?" -- Robert Fisk ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [parisc-linux] Re: Problems with kernel mmap (failing tst-mmap-eofsync in glibc on parisc) 2003-08-22 17:41 ` Matthew Wilcox @ 2003-08-22 17:36 ` David S. Miller 2003-08-22 18:01 ` David S. Miller 0 siblings, 1 reply; 30+ messages in thread From: David S. Miller @ 2003-08-22 17:36 UTC (permalink / raw) To: Matthew Wilcox Cc: willy, James.Bottomley, linux-kernel, parisc-linux, drepper On Fri, 22 Aug 2003 18:41:03 +0100 Matthew Wilcox <willy@debian.org> wrote: > Uhm. So what happens when the user has stored into the page and now > the kernel wants to read from it? There's still data in the cache for > the user mapping that's non-coherent with the kernel mapping. I see. This causes the page cache read flush_dcache_page() call not to trigger. I was very confused by the fact that this bug was explained by saying that "the shared mmap list that flush_dcache_page() checks". So the idea is that VM_SHARED should be set based upon whether we mmap() the thing writable _not_ whether the open() was done with write permission enabled. Yes, I agree with that. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [parisc-linux] Re: Problems with kernel mmap (failing tst-mmap-eofsync in glibc on parisc) 2003-08-22 17:36 ` David S. Miller @ 2003-08-22 18:01 ` David S. Miller 2003-08-22 18:34 ` Hugh Dickins 0 siblings, 1 reply; 30+ messages in thread From: David S. Miller @ 2003-08-22 18:01 UTC (permalink / raw) To: David S. Miller Cc: willy, James.Bottomley, linux-kernel, parisc-linux, drepper On Fri, 22 Aug 2003 10:36:34 -0700 "David S. Miller" <davem@redhat.com> wrote: > On Fri, 22 Aug 2003 18:41:03 +0100 > Matthew Wilcox <willy@debian.org> wrote: > > > Uhm. So what happens when the user has stored into the page and now > > the kernel wants to read from it? There's still data in the cache for > > the user mapping that's non-coherent with the kernel mapping. > > I see. This causes the page cache read flush_dcache_page() call > not to trigger. Wait, I'm confused again. How can the user "write" to the mmap()'d side if PROT_WRITE was not specified? That is the only case in which the proposed patch could make a difference, we check this: switch (flags & MAP_TYPE) { case MAP_SHARED: if ((prot&PROT_WRITE) && !(file->f_mode&FMODE_WRITE)) return -EACCES; Therefore if the user can write to the page, file->f_mode will have the write bit set too. So the proposed patch looks bogus to me. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [parisc-linux] Re: Problems with kernel mmap (failing tst-mmap-eofsync in glibc on parisc) 2003-08-22 18:01 ` David S. Miller @ 2003-08-22 18:34 ` Hugh Dickins 2003-08-22 18:31 ` David S. Miller 2003-08-22 18:41 ` James Bottomley 0 siblings, 2 replies; 30+ messages in thread From: Hugh Dickins @ 2003-08-22 18:34 UTC (permalink / raw) To: David S. Miller Cc: willy, James.Bottomley, linux-kernel, parisc-linux, drepper On Fri, 22 Aug 2003, David S. Miller wrote: > > So the proposed patch looks bogus to me. And to me. If VM_SHARED is set, then __vma_link_file puts the vma on on i_mmap_shared. If VM_SHARED is not set, it puts the vma on i_mmap. flush_dcache_page treats i_mmap_shared and i_mmap lists equally. Might the problem be in parisc's __flush_dcache_page, which only examines i_mmap_shared? Though it's odd, I'm not keen on changing do_mmap_pgoff's usage of VM_SHARED in a hurry: it's worked like that for years, and other things (I forget) depend on it. Hugh ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [parisc-linux] Re: Problems with kernel mmap (failing tst-mmap-eofsync in glibc on parisc) 2003-08-22 18:34 ` Hugh Dickins @ 2003-08-22 18:31 ` David S. Miller 2003-08-22 18:56 ` James Bottomley 2003-08-22 18:41 ` James Bottomley 1 sibling, 1 reply; 30+ messages in thread From: David S. Miller @ 2003-08-22 18:31 UTC (permalink / raw) To: Hugh Dickins; +Cc: willy, James.Bottomley, linux-kernel, parisc-linux, drepper On Fri, 22 Aug 2003 19:34:41 +0100 (BST) Hugh Dickins <hugh@veritas.com> wrote: > And to me. If VM_SHARED is set, then __vma_link_file puts the vma on > on i_mmap_shared. If VM_SHARED is not set, it puts the vma on i_mmap. > flush_dcache_page treats i_mmap_shared and i_mmap lists equally. But file system page cache writes only call flush_dache_page() if the page has a non-empty i_mmap_shared list. > Might the problem be in parisc's __flush_dcache_page, > which only examines i_mmap_shared? No, it examines both lists, the problem is not there. The issue seems to be some confusion about whether the test program in question is actually mmap()'ing the area with PROT_WRITE set, and if so why the test case isn't passing because in such a case the page will have a non-empty i_mmap_shared list. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [parisc-linux] Re: Problems with kernel mmap (failing tst-mmap-eofsync in glibc on parisc) 2003-08-22 18:31 ` David S. Miller @ 2003-08-22 18:56 ` James Bottomley 2003-08-22 19:19 ` David S. Miller 0 siblings, 1 reply; 30+ messages in thread From: James Bottomley @ 2003-08-22 18:56 UTC (permalink / raw) To: David S. Miller; +Cc: Hugh Dickins, willy, Linux Kernel, PARISC list, drepper On Fri, 2003-08-22 at 13:31, David S. Miller wrote: > On Fri, 22 Aug 2003 19:34:41 +0100 (BST) > Hugh Dickins <hugh@veritas.com> wrote: > > > And to me. If VM_SHARED is set, then __vma_link_file puts the vma on > > on i_mmap_shared. If VM_SHARED is not set, it puts the vma on i_mmap. > > flush_dcache_page treats i_mmap_shared and i_mmap lists equally. > > But file system page cache writes only call flush_dache_page() > if the page has a non-empty i_mmap_shared list. Hmm, but if it does that then the glibc bug test should show up on sparc because the i_mmap_shared list is empty if we only do MAP_SHARED of read only files. James ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [parisc-linux] Re: Problems with kernel mmap (failing tst-mmap-eofsync in glibc on parisc) 2003-08-22 18:56 ` James Bottomley @ 2003-08-22 19:19 ` David S. Miller 2003-08-22 22:27 ` James Bottomley 0 siblings, 1 reply; 30+ messages in thread From: David S. Miller @ 2003-08-22 19:19 UTC (permalink / raw) To: James Bottomley; +Cc: hugh, willy, linux-kernel, parisc-linux, drepper On 22 Aug 2003 13:56:06 -0500 James Bottomley <James.Bottomley@SteelEye.com> wrote: > On Fri, 2003-08-22 at 13:31, David S. Miller wrote: > > On Fri, 22 Aug 2003 19:34:41 +0100 (BST) > > Hugh Dickins <hugh@veritas.com> wrote: > > > > > And to me. If VM_SHARED is set, then __vma_link_file puts the vma on > > > on i_mmap_shared. If VM_SHARED is not set, it puts the vma on i_mmap. > > > flush_dcache_page treats i_mmap_shared and i_mmap lists equally. > > > > But file system page cache writes only call flush_dache_page() > > if the page has a non-empty i_mmap_shared list. > > Hmm, but if it does that then the glibc bug test should show up on sparc > because the i_mmap_shared list is empty if we only do MAP_SHARED of read > only files. Sparc64's alias'able caches are 1) write-through and 2) quite small. I think I begin to see the issue clearly now. But you cannot do the VM_SHARED change without an audit first. Lots of code thinks that VM_SHARED means someone maybe wrote to the page through a mmap(). For example look at how filemap sync interprets this flag bit. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [parisc-linux] Re: Problems with kernel mmap (failing tst-mmap-eofsync in glibc on parisc) 2003-08-22 19:19 ` David S. Miller @ 2003-08-22 22:27 ` James Bottomley 2003-08-22 22:41 ` David S. Miller 0 siblings, 1 reply; 30+ messages in thread From: James Bottomley @ 2003-08-22 22:27 UTC (permalink / raw) To: David S. Miller; +Cc: hugh, willy, Linux Kernel, PARISC list, drepper On Fri, 2003-08-22 at 14:19, David S. Miller wrote: > Sparc64's alias'able caches are 1) write-through and 2) quite small. > > I think I begin to see the issue clearly now. > > But you cannot do the VM_SHARED change without an audit first. > Lots of code thinks that VM_SHARED means someone maybe wrote > to the page through a mmap(). For example look at how filemap > sync interprets this flag bit. Yes, the issue seems to be that the flush_dcache_page() was implemented with the thought that the caches of the shared mappings may contain modified data that needs to be flushed to the aliased page. The opposite property: that the caches of the aliased page need to be invalidated because someone else changed data in the aliased page seems to work as a byproduct of the above implementation. But some of the checks for !list_empty(&mapping->i_shared) are going to prevent the necessary invalidations on read only shared mappings...which was the initial problem. The only issue I can see with not dropping VM_SHARED for read only shared mappings is that we do spurious (but harmless) flushe_dcache_page() on reads. There also appears to be a lurking prob lem in do_mremap, where it keys off the VM_SHARED flag to set the MAP_SHARED flag for get_unmapped_area. That's going to cause us a problem on parisc because SHARED pages need to obey slightly stricter alignment constraints All in all, I think not dropping VM_SHARED on read only shared mappings is the right thing to do. Do you need a more detailed audit? James ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [parisc-linux] Re: Problems with kernel mmap (failing tst-mmap-eofsync in glibc on parisc) 2003-08-22 22:27 ` James Bottomley @ 2003-08-22 22:41 ` David S. Miller 2003-08-23 1:09 ` James Bottomley 0 siblings, 1 reply; 30+ messages in thread From: David S. Miller @ 2003-08-22 22:41 UTC (permalink / raw) To: James Bottomley; +Cc: hugh, willy, linux-kernel, parisc-linux, drepper On 22 Aug 2003 17:27:33 -0500 James Bottomley <James.Bottomley@SteelEye.com> wrote: > Yes, the issue seems to be that the flush_dcache_page() was implemented > with the thought that the caches of the shared mappings may contain > modified data that needs to be flushed to the aliased page. > > The opposite property: that the caches of the aliased page need to be > invalidated because someone else changed data in the aliased page seems > to work as a byproduct of the above implementation. > > But some of the checks for !list_empty(&mapping->i_shared) are going to > prevent the necessary invalidations on read only shared mappings...which > was the initial problem. The theory of operation is that there are two "classes" of mappings for a page, the implicit kernel mapping and all user mappings. The goal is to flush out one class from the cache when the other one writes to such a page. When a write() system call occurs, the kernel "class" is writing to the page so all user mappings (shared or not!) need to be flushed out. When a read() system call occurs, and shared+writable mmap()'s of the page exist, we must flush the user mapping "class" before the kernel "class" tries to read from that page. You cannot avoid doing things exactly as I've just described without allowing bad aliases to be in the cache and corrupt data. Your test case is essentially, annotated with what the kernel should do at each step: static const char *test1 = "Line the first\n"; static const char *test2 = "Line the second\n"; temp_fd = open(O_RDWR); write(temp_fd, test1, sizeof test1 - 1); No mmaps of this page, so no flush_dcache_page() call. ... fd = open(O_RDONLY); p = mmap(fd, ... PROT_READ ...); Not writable, not added to i_mmap_shared list, instead it is added to normal i_mmap list. memcpy(tmp_buf, p, sizeof test1 - 1); p += sizeof test1 - 1; if (strcmp(tmp_buf, test1)) BUG(); ... write(temp_fd, test2, sizeof test2 - 1); Mmaps of this page exist, flush_dcache_page() call is made. memcpy(tmp_buf, p, sizeof test2 - 1); if (strcmp(tmp_buf, test2)) BUG(); And thus memcpy() sees correct data. I think on parisc you are trying to avoid the write() case of the cache flush for non-shared mmap()s, and sorry you really can't do this, again this is: When a write() system call occurs, the kernel "class" is writing to the page so all user mappings (shared or not!) need to be flushed out. If your flush_dcache_page() is not doing this, it's no wonder the test case fails for you. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [parisc-linux] Re: Problems with kernel mmap (failing tst-mmap-eofsync in glibc on parisc) 2003-08-22 22:41 ` David S. Miller @ 2003-08-23 1:09 ` James Bottomley 2003-08-23 7:22 ` Hugh Dickins 2003-08-23 21:43 ` David S. Miller 0 siblings, 2 replies; 30+ messages in thread From: James Bottomley @ 2003-08-23 1:09 UTC (permalink / raw) To: David S. Miller; +Cc: hugh, willy, Linux Kernel, PARISC list, drepper On Fri, 2003-08-22 at 17:41, David S. Miller wrote: > I think on parisc you are trying to avoid the write() case > of the cache flush for non-shared mmap()s, and sorry you > really can't do this, again this is: > > When a write() system call occurs, the kernel "class" is writing to > the page so all user mappings (shared or not!) need to be flushed > out. > > If your flush_dcache_page() is not doing this, it's no wonder > the test case fails for you. Yes, that's precisely what we're trying to do. Our problem is that we have to issue the flush to all the aliased addresses (one cache line at a time) which is phenomenally expensive. What we were hoping is that we could rely on this little property of mmap: MAP_PRIVATE Create a private copy-on-write mapping. Stores to the region do not affect the original file. It is unspecified whether changes made to the file after the mmap call are visible in the mapped region. To avoid having to flush the non-shared mappings (basically on parisc if you write to a file backing a MAP_PRIVATE mapping then we don't guarantee you see the update). I suppose if we had a way of telling if any of the i_mmap list members were really MAP_SHARED semantics mappings, then we could alter our flush_dcache_page() implementation to work. James ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [parisc-linux] Re: Problems with kernel mmap (failing tst-mmap-eofsync in glibc on parisc) 2003-08-23 1:09 ` James Bottomley @ 2003-08-23 7:22 ` Hugh Dickins 2003-08-23 15:59 ` James Bottomley 2003-08-23 21:44 ` David S. Miller 2003-08-23 21:43 ` David S. Miller 1 sibling, 2 replies; 30+ messages in thread From: Hugh Dickins @ 2003-08-23 7:22 UTC (permalink / raw) To: James Bottomley Cc: David S. Miller, willy, Linux Kernel, PARISC list, drepper On 22 Aug 2003, James Bottomley wrote: > > I suppose if we had a way of telling if any of the i_mmap list members > were really MAP_SHARED semantics mappings, then we could alter our > flush_dcache_page() implementation to work. Good idea. It's VM_MAYSHARE you need to check for. Hugh ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [parisc-linux] Re: Problems with kernel mmap (failing tst-mmap-eofsync in glibc on parisc) 2003-08-23 7:22 ` Hugh Dickins @ 2003-08-23 15:59 ` James Bottomley 2003-08-23 21:44 ` David S. Miller 1 sibling, 0 replies; 30+ messages in thread From: James Bottomley @ 2003-08-23 15:59 UTC (permalink / raw) To: Hugh Dickins; +Cc: David S. Miller, willy, Linux Kernel, PARISC list, drepper [-- Attachment #1: Type: text/plain, Size: 415 bytes --] On Sat, 2003-08-23 at 02:22, Hugh Dickins wrote: > Good idea. It's VM_MAYSHARE you need to check for. OK, to get all this to work, there's a corner case in do_mremap() that I need to be fixed: When choosing the flags for the new area, it keys off VM_SHARED to determine whether MAP_SHARED is passed to the mapping. It has to key of VM_MAYSHARE to preserve VM_MAYSHARE on the new mapping. Patch below. James [-- Attachment #2: tmp.diff --] [-- Type: text/plain, Size: 421 bytes --] ===== mremap.c 1.32 vs edited ===== --- 1.32/mm/mremap.c Thu Aug 7 12:29:10 2003 +++ edited/mremap.c Sat Aug 23 10:54:21 2003 @@ -420,7 +420,7 @@ if (flags & MREMAP_MAYMOVE) { if (!(flags & MREMAP_FIXED)) { unsigned long map_flags = 0; - if (vma->vm_flags & VM_SHARED) + if (vma->vm_flags & VM_MAYSHARE) map_flags |= MAP_SHARED; new_addr = get_unmapped_area(vma->vm_file, 0, new_len, ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [parisc-linux] Re: Problems with kernel mmap (failing tst-mmap-eofsync in glibc on parisc) 2003-08-23 7:22 ` Hugh Dickins 2003-08-23 15:59 ` James Bottomley @ 2003-08-23 21:44 ` David S. Miller 1 sibling, 0 replies; 30+ messages in thread From: David S. Miller @ 2003-08-23 21:44 UTC (permalink / raw) To: Hugh Dickins; +Cc: James.Bottomley, willy, linux-kernel, parisc-linux, drepper On Sat, 23 Aug 2003 08:22:19 +0100 (BST) Hugh Dickins <hugh@veritas.com> wrote: > On 22 Aug 2003, James Bottomley wrote: > > > > I suppose if we had a way of telling if any of the i_mmap list members > > were really MAP_SHARED semantics mappings, then we could alter our > > flush_dcache_page() implementation to work. > > Good idea. It's VM_MAYSHARE you need to check for. Nope, please see my other email for why all of these ideas simply will not work. If the first fault-in of a MAP_PRIVATE page is a read, it's just like a MAP_SHARED read-only page until the first write occurs. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [parisc-linux] Re: Problems with kernel mmap (failing tst-mmap-eofsync in glibc on parisc) 2003-08-23 1:09 ` James Bottomley 2003-08-23 7:22 ` Hugh Dickins @ 2003-08-23 21:43 ` David S. Miller 2003-08-23 22:21 ` James Bottomley 1 sibling, 1 reply; 30+ messages in thread From: David S. Miller @ 2003-08-23 21:43 UTC (permalink / raw) To: James Bottomley; +Cc: hugh, willy, linux-kernel, parisc-linux, drepper On 22 Aug 2003 20:09:30 -0500 James Bottomley <James.Bottomley@SteelEye.com> wrote: > What we were hoping is that we could rely on this little property of > mmap: > > MAP_PRIVATE > Create a private copy-on-write mapping. Stores > to the region do not affect the original file. > It is unspecified whether changes made to the > file after the mmap call are visible in the > mapped region. > > To avoid having to flush the non-shared mappings (basically on parisc if > you write to a file backing a MAP_PRIVATE mapping then we don't > guarantee you see the update). > > I suppose if we had a way of telling if any of the i_mmap list members > were really MAP_SHARED semantics mappings, then we could alter our > flush_dcache_page() implementation to work. I thought about this very deeply last night and this morning. And what you're trying to optimize won't work. Here is why. If the first access to a MAP_PRIVATE mapping of a page is a read, we'll use the page-cache page. This means that, with your optimization, during this time if another cpu write()`s into the page we'll lose the data update. Sorry :( ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [parisc-linux] Re: Problems with kernel mmap (failing tst-mmap-eofsync in glibc on parisc) 2003-08-23 21:43 ` David S. Miller @ 2003-08-23 22:21 ` James Bottomley 2003-08-23 22:51 ` David S. Miller 2003-08-23 22:53 ` David S. Miller 0 siblings, 2 replies; 30+ messages in thread From: James Bottomley @ 2003-08-23 22:21 UTC (permalink / raw) To: David S. Miller; +Cc: hugh, willy, Linux Kernel, PARISC list, drepper On Sat, 2003-08-23 at 16:43, David S. Miller wrote: > On 22 Aug 2003 20:09:30 -0500 > James Bottomley <James.Bottomley@SteelEye.com> wrote: > > > What we were hoping is that we could rely on this little property of > > mmap: > > > > MAP_PRIVATE > > Create a private copy-on-write mapping. Stores > > to the region do not affect the original file. > > It is unspecified whether changes made to the > > file after the mmap call are visible in the > > mapped region. > > > > To avoid having to flush the non-shared mappings (basically on parisc if > > you write to a file backing a MAP_PRIVATE mapping then we don't > > guarantee you see the update). > > > > I suppose if we had a way of telling if any of the i_mmap list members > > were really MAP_SHARED semantics mappings, then we could alter our > > flush_dcache_page() implementation to work. > > I thought about this very deeply last night and this morning. > And what you're trying to optimize won't work. Here is why. > > If the first access to a MAP_PRIVATE mapping of a page is a read, > we'll use the page-cache page. This means that, with your > optimization, during this time if another cpu write()`s into the > page we'll lose the data update. > > Sorry :( Could you elaborate some more? I agree that the MAP_PRIVATE mapping may not see cpu1's write because of cache incoherencies (but that's what I believe is covered by the `unspecified' bit of the MAP_PRIVATE definition above). MAP_PRIVATE is COW (i.e. the page is marked read only while it's shared), so there can be no data to flush in the cache for the page of the MAP_PRIVATE process...The only scenario I see where we can get cache based data destruction is if two cache aliases both contain dirty caches for the page (which can only happen for MAP_SHARED RW, which we already have the correct semantics for...they're racing to do a msync and the last one in wins). James ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [parisc-linux] Re: Problems with kernel mmap (failing tst-mmap-eofsync in glibc on parisc) 2003-08-23 22:21 ` James Bottomley @ 2003-08-23 22:51 ` David S. Miller 2003-08-23 23:01 ` James Bottomley 2003-08-23 22:53 ` David S. Miller 1 sibling, 1 reply; 30+ messages in thread From: David S. Miller @ 2003-08-23 22:51 UTC (permalink / raw) To: James Bottomley; +Cc: hugh, willy, linux-kernel, parisc-linux, drepper On 23 Aug 2003 17:21:21 -0500 James Bottomley <James.Bottomley@SteelEye.com> wrote: > On Sat, 2003-08-23 at 16:43, David S. Miller wrote: > > On 22 Aug 2003 20:09:30 -0500 > > James Bottomley <James.Bottomley@SteelEye.com> wrote: > > > > > MAP_PRIVATE > > > Create a private copy-on-write mapping. Stores > > > to the region do not affect the original file. > > > It is unspecified whether changes made to the > > > file after the mmap call are visible in the > > > mapped region. ... > Could you elaborate some more? I agree that the MAP_PRIVATE mapping may > not see cpu1's write because of cache incoherencies (but that's what I > believe is covered by the `unspecified' bit of the MAP_PRIVATE > definition above). Ok. Let me think about this a bit more. The safest solution for parisc, meanwhile, would be to walk the non-shared mmap list checking for any instance of the VM_MAYSHARE bit being set. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [parisc-linux] Re: Problems with kernel mmap (failing tst-mmap-eofsync in glibc on parisc) 2003-08-23 22:51 ` David S. Miller @ 2003-08-23 23:01 ` James Bottomley 0 siblings, 0 replies; 30+ messages in thread From: James Bottomley @ 2003-08-23 23:01 UTC (permalink / raw) To: David S. Miller; +Cc: hugh, willy, Linux Kernel, PARISC list, drepper On Sat, 2003-08-23 at 17:51, David S. Miller wrote: > Ok. Let me think about this a bit more. > > The safest solution for parisc, meanwhile, would be to walk the > non-shared mmap list checking for any instance of the VM_MAYSHARE bit > being set. Right, that's how I plan to fix this problem in parisc. We also need the VM_MAYSHARE flag to propagate across remappings, which was the general kernel fix I sent some emails back. James ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [parisc-linux] Re: Problems with kernel mmap (failing tst-mmap-eofsync in glibc on parisc) 2003-08-23 22:21 ` James Bottomley 2003-08-23 22:51 ` David S. Miller @ 2003-08-23 22:53 ` David S. Miller 2003-08-23 23:11 ` James Bottomley 1 sibling, 1 reply; 30+ messages in thread From: David S. Miller @ 2003-08-23 22:53 UTC (permalink / raw) To: James Bottomley; +Cc: hugh, willy, linux-kernel, parisc-linux, drepper On 23 Aug 2003 17:21:21 -0500 James Bottomley <James.Bottomley@SteelEye.com> wrote: > On Sat, 2003-08-23 at 16:43, David S. Miller wrote: > > On 22 Aug 2003 20:09:30 -0500 > > James Bottomley <James.Bottomley@SteelEye.com> wrote: > > > > > To avoid having to flush the non-shared mappings (basically on parisc if > > > you write to a file backing a MAP_PRIVATE mapping then we don't > > > guarantee you see the update). BTW, what gains to you really get from this optimization? How often do writes happen to files while private mappings to it exist? :-) This is one of the reasons I think this discussion is a bit silly. What specific cases does your optimization help, and how common is it? Show us some numbers. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [parisc-linux] Re: Problems with kernel mmap (failing tst-mmap-eofsync in glibc on parisc) 2003-08-23 22:53 ` David S. Miller @ 2003-08-23 23:11 ` James Bottomley 2003-08-24 0:22 ` David S. Miller 0 siblings, 1 reply; 30+ messages in thread From: James Bottomley @ 2003-08-23 23:11 UTC (permalink / raw) To: David S. Miller; +Cc: hugh, willy, Linux Kernel, PARISC list, drepper On Sat, 2003-08-23 at 17:53, David S. Miller wrote: > BTW, what gains to you really get from this optimization? > > How often do writes happen to files while private mappings > to it exist? :-) This is one of the reasons I think this > discussion is a bit silly. > > What specific cases does your optimization help, and how common is it? > Show us some numbers. Not having to flush the private mappings is a huge optimisation. Our current flush_dcache_page implementation allows us only to flush a single page and get all the aliased caches updated because we carefully align MAP_SHARED areas (by supplying our own arch_get_unmapped_area()). However, the alignment constraint is 4MB to get this property of the virtually aliased caches, so we can't afford to align all mappings like this (for our 32 bit userspace, anyway). If we were to have to flush the private i_mmap list, we'd have to do a page flush for *every* entry in the list (that's 256 instructions per page at a cache width of 16 bytes). This would be a horrific overhead. using the VM_MAYSHARE to carry the read only shared mapping semantics indication still allows us to align correctly, but the only additional overhead we incur is to walk the i_mmap list to find VM_MAYSHARE mappings as well as i_mmap_shared. Since we can key of VM_MAYSHARE to do the alignment, we still only need flush the first one we come to. This all works as long as we can agree there are no pathological mmap cases that force us to flush *all* the i_mmap mappings...which is what I think the discussion has come down to. James ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [parisc-linux] Re: Problems with kernel mmap (failing tst-mmap-eofsync in glibc on parisc) 2003-08-23 23:11 ` James Bottomley @ 2003-08-24 0:22 ` David S. Miller [not found] ` <1061702282.1992.1153.camel@mulgrave> 0 siblings, 1 reply; 30+ messages in thread From: David S. Miller @ 2003-08-24 0:22 UTC (permalink / raw) To: James Bottomley; +Cc: hugh, willy, linux-kernel, parisc-linux, drepper On 23 Aug 2003 18:11:16 -0500 James Bottomley <James.Bottomley@SteelEye.com> wrote: > On Sat, 2003-08-23 at 17:53, David S. Miller wrote: > > How often do writes happen to files while private mappings > > to it exist? :-) This is one of the reasons I think this > > discussion is a bit silly. ... > Not having to flush the private mappings is a huge optimisation. You're not answering my question :( I know that when the case _DOES_ happen, your optimization is worthwhile, my question is not about this. My question is about how often does this case happen. ^ permalink raw reply [flat|nested] 30+ messages in thread
[parent not found: <1061702282.1992.1153.camel@mulgrave>]
[parent not found: <20030823222300.4695a0c4.davem@redhat.com>]
* Re: [parisc-linux] Re: Problems with kernel mmap (failing tst-mmap-eofsync in glibc on parisc) [not found] ` <20030823222300.4695a0c4.davem@redhat.com> @ 2003-08-24 16:54 ` James Bottomley 0 siblings, 0 replies; 30+ messages in thread From: James Bottomley @ 2003-08-24 16:54 UTC (permalink / raw) To: David S. Miller; +Cc: hugh, willy, Linux Kernel, PARISC list, drepper On Sun, 2003-08-24 at 00:23, David S. Miller wrote: > This is what I'm asking of you, to find cases in real life, > not some contrived example, where your optimization helps > appreciably. Oh, OK, that's easy...it's what the glibc test was designed for. In glibc, for each file you fopen as a mapped file, you seem to get a separate mmapping of the file (this actually looks wrong to me...it seems glibc should have only one mapping per file which all the file objects share, but anyway). That means we get one entry in one of the i_mmaps lists for each of the opens. Since these are files, the chances are they'll be read and written which will generate lots of dcache flushes. This case would kill us if we had to flush every entry in i_mmap. Right at the moment, you have to specifically request that the file be mmapped by specifying the "m" modifier, but glibc seems to be migrating to this being the default one day...that's what I want to be ready for. Besides the optimisation adds no overhead and compromises nothing, so its worth doing regardless. James ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [parisc-linux] Re: Problems with kernel mmap (failing tst-mmap-eofsync in glibc on parisc) 2003-08-22 18:34 ` Hugh Dickins 2003-08-22 18:31 ` David S. Miller @ 2003-08-22 18:41 ` James Bottomley 2003-08-22 19:02 ` Hugh Dickins 2003-08-22 19:09 ` Randolph Chung 1 sibling, 2 replies; 30+ messages in thread From: James Bottomley @ 2003-08-22 18:41 UTC (permalink / raw) To: Hugh Dickins; +Cc: David S. Miller, willy, Linux Kernel, PARISC list, drepper On Fri, 2003-08-22 at 13:34, Hugh Dickins wrote: > Might the problem be in parisc's __flush_dcache_page, > which only examines i_mmap_shared? This is the issue: we do treat them differently. Semantics differ between privately mapped data (where there's no coherency guarantee) and shared data (where there is). Flushing the virtual cache is expensive on pa, so we only do it for the i_mmap_shared list. The difficulty is that a mmap of a read only file with MAP_SHARED is expecting the shared cache semantics, but gets added to the non shared list. Since flushing the caches is a performance hog, we'd like do be able to distinguish the cases where we have to do the flush MAP_SHARED mappings from those we don't (MAP_PRIVATE). James ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [parisc-linux] Re: Problems with kernel mmap (failing tst-mmap-eofsync in glibc on parisc) 2003-08-22 18:41 ` James Bottomley @ 2003-08-22 19:02 ` Hugh Dickins 2003-08-22 19:09 ` Randolph Chung 1 sibling, 0 replies; 30+ messages in thread From: Hugh Dickins @ 2003-08-22 19:02 UTC (permalink / raw) To: James Bottomley Cc: David S. Miller, willy, Linux Kernel, PARISC list, drepper On 22 Aug 2003, James Bottomley wrote: > On Fri, 2003-08-22 at 13:34, Hugh Dickins wrote: > > Might the problem be in parisc's __flush_dcache_page, > > which only examines i_mmap_shared? > > This is the issue: we do treat them differently. > > Semantics differ between privately mapped data (where there's no > coherency guarantee) and shared data (where there is). Flushing the > virtual cache is expensive on pa, so we only do it for the i_mmap_shared > list. > > The difficulty is that a mmap of a read only file with MAP_SHARED is > expecting the shared cache semantics, but gets added to the non shared > list. > > Since flushing the caches is a performance hog, we'd like do be able to > distinguish the cases where we have to do the flush MAP_SHARED mappings > from those we don't (MAP_PRIVATE). The naming "i_mmap_shared" does suggest that once upon a time those lists were as you'd like; and that at some point it was changed. Perhaps some arches prefer the coherency guarantee I'm familiar with in MAP_PRIVATE (yes, when you modify a page yourself, it cows off and becomes private; but in i386 it's shared up until then), and other arches (like yours) would prefer to avoid the overhead. Hugh ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [parisc-linux] Re: Problems with kernel mmap (failing tst-mmap-eofsync in glibc on parisc) 2003-08-22 18:41 ` James Bottomley 2003-08-22 19:02 ` Hugh Dickins @ 2003-08-22 19:09 ` Randolph Chung 1 sibling, 0 replies; 30+ messages in thread From: Randolph Chung @ 2003-08-22 19:09 UTC (permalink / raw) To: James Bottomley Cc: Hugh Dickins, David S. Miller, willy, Linux Kernel, PARISC list, drepper In reference to a message from James Bottomley, dated Aug 22: > On Fri, 2003-08-22 at 13:34, Hugh Dickins wrote: > > Might the problem be in parisc's __flush_dcache_page, > > which only examines i_mmap_shared? > > This is the issue: we do treat them differently. as does some other archs, like ARM. are we saying that MAP_SHARED != VM_SHARED? the mmap code allows architectures to map pages differently if MAP_SHARED is specified, but it puts it on i_mmap vs i_mmap_shared using VM_SHARED, and for read-only files we silently drop VM_SHARED... so the page is mapped using MAP_SHARED semantics but placed on i_mmap.... confused, randolph -- Randolph Chung Debian GNU/Linux Developer, hppa/ia64 ports http://www.tausq.org/ ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [parisc-linux] Re: Problems with kernel mmap (failing tst-mmap-eofsync in glibc on parisc) 2003-08-22 16:34 ` [parisc-linux] " Matthew Wilcox 2003-08-22 16:39 ` David S. Miller @ 2003-08-22 16:42 ` Russell King 2003-08-22 16:39 ` David S. Miller 1 sibling, 1 reply; 30+ messages in thread From: Russell King @ 2003-08-22 16:42 UTC (permalink / raw) To: Matthew Wilcox Cc: David S. Miller, James Bottomley, linux-kernel, parisc-linux, drepper On Fri, Aug 22, 2003 at 05:34:29PM +0100, Matthew Wilcox wrote: > On Fri, Aug 22, 2003 at 09:14:47AM -0700, David S. Miller wrote: > > On 22 Aug 2003 09:40:37 -0500 > > James Bottomley <James.Bottomley@SteelEye.com> wrote: > > > The reason this doesn't happen is because the mapping is not on the > > > mmap_shared list that flush_dcache_page() updates. > > > > flush_dcache_page() checks both the shared and non-shared mmap lists, > > so if it is on _either_ list it is flushed. It does not check only > > the shared list. Eww. > Gah, that's going to get really inefficient. I still think we want to > split flush_dcache_page() into two operations -- flush_dcache_user() and > flush_dcache_kernel(). flush_dcache_user() would flush this specific > user mapping back to ram and flush_dcache_kernel() would flush the > kernel mapping. Where are you proposing calling only _user() and _kernel() from ? -- Russell King (rmk@arm.linux.org.uk) The developer of ARM Linux http://www.arm.linux.org.uk/personal/aboutme.html ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [parisc-linux] Re: Problems with kernel mmap (failing tst-mmap-eofsync in glibc on parisc) 2003-08-22 16:42 ` Russell King @ 2003-08-22 16:39 ` David S. Miller 0 siblings, 0 replies; 30+ messages in thread From: David S. Miller @ 2003-08-22 16:39 UTC (permalink / raw) To: Russell King; +Cc: willy, James.Bottomley, linux-kernel, parisc-linux, drepper On Fri, 22 Aug 2003 17:42:03 +0100 Russell King <rmk@arm.linux.org.uk> wrote: > > Gah, that's going to get really inefficient. I still think we want to > > split flush_dcache_page() into two operations -- flush_dcache_user() and > > flush_dcache_kernel(). flush_dcache_user() would flush this specific > > user mapping back to ram and flush_dcache_kernel() would flush the > > kernel mapping. > > Where are you proposing calling only _user() and _kernel() from ? The is not acceptable answer. Purely, flush_dcache_page() is defined to execute when the kernel stores into a page cache page, and that is it's only valid definition. Splitting into a "user" part makes absolutely no sense. ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2003-08-24 16:54 UTC | newest] Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-08-22 14:40 Problems with kernel mmap (failing tst-mmap-eofsync in glibc on parisc) James Bottomley 2003-08-22 16:14 ` David S. Miller 2003-08-22 16:34 ` [parisc-linux] " Matthew Wilcox 2003-08-22 16:39 ` David S. Miller 2003-08-22 17:41 ` Matthew Wilcox 2003-08-22 17:36 ` David S. Miller 2003-08-22 18:01 ` David S. Miller 2003-08-22 18:34 ` Hugh Dickins 2003-08-22 18:31 ` David S. Miller 2003-08-22 18:56 ` James Bottomley 2003-08-22 19:19 ` David S. Miller 2003-08-22 22:27 ` James Bottomley 2003-08-22 22:41 ` David S. Miller 2003-08-23 1:09 ` James Bottomley 2003-08-23 7:22 ` Hugh Dickins 2003-08-23 15:59 ` James Bottomley 2003-08-23 21:44 ` David S. Miller 2003-08-23 21:43 ` David S. Miller 2003-08-23 22:21 ` James Bottomley 2003-08-23 22:51 ` David S. Miller 2003-08-23 23:01 ` James Bottomley 2003-08-23 22:53 ` David S. Miller 2003-08-23 23:11 ` James Bottomley 2003-08-24 0:22 ` David S. Miller [not found] ` <1061702282.1992.1153.camel@mulgrave> [not found] ` <20030823222300.4695a0c4.davem@redhat.com> 2003-08-24 16:54 ` James Bottomley 2003-08-22 18:41 ` James Bottomley 2003-08-22 19:02 ` Hugh Dickins 2003-08-22 19:09 ` Randolph Chung 2003-08-22 16:42 ` Russell King 2003-08-22 16:39 ` David S. Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).