On 21.11.22 16:00, Muhammad Usama Anjum wrote: > Hello, > > Thank you for replying. > > On 11/14/22 8:46 PM, David Hildenbrand wrote: >>> The soft-dirtiness is stored in the PTE. VMA is marked dirty to store the >>> dirtiness for reused regions. Clearing the soft-dirty status of whole >>> process is straight forward. When we want to clear/monitor the >>> soft-dirtiness of a part of the virtual memory, there is a lot of internal >>> noise. We don't want the non-dirty pages to become dirty because of how the >>> soft-dirty feature has been working. Soft-dirty feature wasn't being used >>> the way we want to use now. While monitoring a part of memory, it is not >>> acceptable to get non-dirty pages as dirty. Non-dirty pages become dirty >>> when the two VMAs are merged without considering if they both are dirty or >>> not (34228d473efe). To monitor changes over the memory, sometimes VMAs are >>> split to clear the soft-dirty bit in the VMA flags. But sometimes kernel >>> decide to merge them backup. It is so waste of resources. >> >> Maybe you'd want a per-process option to not merge if the VM_SOFTDIRTY >> property differs. But that might be just one alternative for handling this >> case. >> >>> >>> To keep things consistent, the default behavior of the IOCTL is to output >>> even the extra non-dirty pages as dirty from the kernel noise. A optional >>> PAGEMAP_NO_REUSED_REGIONS flag is added for those use cases which aren't >>> tolerant of extra non-dirty pages. This flag can be considered as something >>> which is by-passing the already present buggy implementation in the kernel. >>> It is not buggy per say as the issue can be solved if we don't allow the >>> two VMA which have different soft-dirty bits to get merged. But we are >>> allowing that so that the total number of VMAs doesn't increase. This was >>> acceptable at the time, but now with the use case of monitoring a part of >>> memory for soft-dirty doesn't want this merging. So either we need to >>> revert 34228d473efe and PAGEMAP_NO_REUSED_REGIONS flag will not be needed >>> or we should allow PAGEMAP_NO_REUSED_REGIONS or similar mechanism to ignore >>> the extra dirty pages which aren't dirty in reality. >>> >>> When PAGEMAP_NO_REUSED_REGIONS flag is used, only the PTEs are checked to >>> find if the pages are dirty. So re-used regions cannot be detected. This >>> has the only side-effect of not checking the VMAs. So this is limitation of >>> using this flag which should be acceptable in the current state of code. >>> This limitation is okay for the users as they can clear the soft-dirty bit >>> of the VMA before starting to monitor a range of memory for soft-dirtiness. >>> >>> >>>> Please separate that part out from the other changes; I am still not >>>> convinced that we want this and what the semantical implications are. >>>> >>>> Let's take a look at an example: can_change_pte_writable() >>>> >>>>      /* Do we need write faults for softdirty tracking? */ >>>>      if (vma_soft_dirty_enabled(vma) && !pte_soft_dirty(pte)) >>>>          return false; >>>> >>>> We care about PTE softdirty tracking, if it is enabled for the VMA. >>>> Tracking is enabled if: vma_soft_dirty_enabled() >>>> >>>>      /* >>>>       * Soft-dirty is kind of special: its tracking is enabled when >>>>       * the vma flags not set. >>>>       */ >>>>      return !(vma->vm_flags & VM_SOFTDIRTY); >>>> >>>> Consequently, if VM_SOFTDIRTY is set, we are not considering the soft_dirty >>>> PTE bits accordingly. >>> Sorry, I'm unable to completely grasp the meaning of the example. We have >>> followed clear_refs_write() to write the soft-dirty bit clearing code in >>> the current patch. Dirtiness of the VMA and the PTE may be set >>> independently. Newer allocated memory has dirty bit set in the VMA. When >>> something is written the memory, the soft dirty bit is set in the PTEs as >>> well regardless if the soft dirty bit is set in the VMA or not. >>> >> >> Let me try to find a simple explanation: >> >> After clearing a SOFTDIRTY PTE flag inside an area with VM_SOFTDIRTY set, >> there are ways that PTE could get written to and it could become dirty, >> without the PTE becoming softdirty. >> >> Essentially, inside a VMA with VM_SOFTDIRTY set, the PTE softdirty values >> might be stale: there might be entries that are softdirty even though the >> PTE is *not* marked softdirty. > Can someone please share the example to reproduce this? In all of my > testing, even if I ignore VM_SOFTDIRTY and only base my decision of > soft-dirtiness on individual pages, it always passes. Quick reproducer (the first and easiest one that triggered :) ) attached. With no kernel changes, it works as expected. # ./softdirty_mprotect With the following kernel change to simulate what you propose it fails: diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index d22687d2e81e..f2c682bf7f64 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -1457,8 +1457,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm, flags |= PM_FILE; if (page && !migration && page_mapcount(page) == 1) flags |= PM_MMAP_EXCLUSIVE; - if (vma->vm_flags & VM_SOFTDIRTY) - flags |= PM_SOFT_DIRTY; + //if (vma->vm_flags & VM_SOFTDIRTY) + // flags |= PM_SOFT_DIRTY; return make_pme(frame, flags); } # ./softdirty_mprotect Page #1 should be softdirty -- Thanks, David / dhildenb