All of lore.kernel.org
 help / color / mirror / Atom feed
From: Suren Baghdasaryan <surenb@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: llvm@lists.linux.dev, oe-kbuild-all@lists.linux.dev,
	 kernel test robot <lkp@intel.com>
Subject: Re: [PATCH 1/1] mm/khugepaged: fix vm_lock/i_mmap_rwsem inversion in retract_page_tables
Date: Sat, 4 Mar 2023 15:24:15 -0800	[thread overview]
Message-ID: <CAJuCfpFjWhtzRE1X=J+_JjgJzNKhq-=JT8yTBSTHthwp0pqWZw@mail.gmail.com> (raw)
In-Reply-To: <202303041842.Cv8cTjlL-lkp@intel.com>

Hi Andrew,
I missed vma_try_start_write() definition for CONFIG_PER_VMA_LOCK=n
configuration. Could you please patch this with the following?:

--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -740,6 +740,8 @@ static inline bool vma_start_read(struct
vm_area_struct *vma)
  { return false; }
 static inline void vma_end_read(struct vm_area_struct *vma) {}
 static inline void vma_start_write(struct vm_area_struct *vma) {}
+static inline bool vma_try_start_write(struct vm_area_struct *vma) {
return true; }
 static inline void vma_assert_write_locked(struct vm_area_struct *vma) {}

or should I send a separate patch?
Thanks,
Suren.



On Sat, Mar 4, 2023 at 2:57 AM kernel test robot <lkp@intel.com> wrote:
>
> Hi Suren,
>
> I love your patch! Yet something to improve:
>
> [auto build test ERROR on akpm-mm/mm-everything]
>
> url:    https://github.com/intel-lab-lkp/linux/commits/Suren-Baghdasaryan/mm-khugepaged-fix-vm_lock-i_mmap_rwsem-inversion-in-retract_page_tables/20230304-053401
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
> patch link:    https://lore.kernel.org/r/20230303213250.3555716-1-surenb%40google.com
> patch subject: [PATCH 1/1] mm/khugepaged: fix vm_lock/i_mmap_rwsem inversion in retract_page_tables
> config: i386-randconfig-a013 (https://download.01.org/0day-ci/archive/20230304/202303041842.Cv8cTjlL-lkp@intel.com/config)
> compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1)
> reproduce (this is a W=1 build):
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # https://github.com/intel-lab-lkp/linux/commit/338d2ee433a0559b3c6c3966b8d4ffe55e8dc6c9
>         git remote add linux-review https://github.com/intel-lab-lkp/linux
>         git fetch --no-tags linux-review Suren-Baghdasaryan/mm-khugepaged-fix-vm_lock-i_mmap_rwsem-inversion-in-retract_page_tables/20230304-053401
>         git checkout 338d2ee433a0559b3c6c3966b8d4ffe55e8dc6c9
>         # save the config file
>         mkdir build_dir && cp config build_dir/.config
>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 olddefconfig
>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash
>
> If you fix the issue, kindly add following tag where applicable
> | Reported-by: kernel test robot <lkp@intel.com>
> | Link: https://lore.kernel.org/oe-kbuild-all/202303041842.Cv8cTjlL-lkp@intel.com/
>
> All errors (new ones prefixed by >>):
>
> >> mm/khugepaged.c:1799:9: error: implicit declaration of function 'vma_try_start_write' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
>                            if (!vma_try_start_write(vma))
>                                 ^
>    mm/khugepaged.c:1799:9: note: did you mean 'vma_start_write'?
>    include/linux/mm.h:742:20: note: 'vma_start_write' declared here
>    static inline void vma_start_write(struct vm_area_struct *vma) {}
>                       ^
>    1 error generated.
>
>
> vim +/vma_try_start_write +1799 mm/khugepaged.c
>
>   1736
>   1737  static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff,
>   1738                                 struct mm_struct *target_mm,
>   1739                                 unsigned long target_addr, struct page *hpage,
>   1740                                 struct collapse_control *cc)
>   1741  {
>   1742          struct vm_area_struct *vma;
>   1743          int target_result = SCAN_FAIL;
>   1744
>   1745          i_mmap_lock_write(mapping);
>   1746          vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) {
>   1747                  int result = SCAN_FAIL;
>   1748                  struct mm_struct *mm = NULL;
>   1749                  unsigned long addr = 0;
>   1750                  pmd_t *pmd;
>   1751                  bool is_target = false;
>   1752
>   1753                  /*
>   1754                   * Check vma->anon_vma to exclude MAP_PRIVATE mappings that
>   1755                   * got written to. These VMAs are likely not worth investing
>   1756                   * mmap_write_lock(mm) as PMD-mapping is likely to be split
>   1757                   * later.
>   1758                   *
>   1759                   * Note that vma->anon_vma check is racy: it can be set up after
>   1760                   * the check but before we took mmap_lock by the fault path.
>   1761                   * But page lock would prevent establishing any new ptes of the
>   1762                   * page, so we are safe.
>   1763                   *
>   1764                   * An alternative would be drop the check, but check that page
>   1765                   * table is clear before calling pmdp_collapse_flush() under
>   1766                   * ptl. It has higher chance to recover THP for the VMA, but
>   1767                   * has higher cost too. It would also probably require locking
>   1768                   * the anon_vma.
>   1769                   */
>   1770                  if (READ_ONCE(vma->anon_vma)) {
>   1771                          result = SCAN_PAGE_ANON;
>   1772                          goto next;
>   1773                  }
>   1774                  addr = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
>   1775                  if (addr & ~HPAGE_PMD_MASK ||
>   1776                      vma->vm_end < addr + HPAGE_PMD_SIZE) {
>   1777                          result = SCAN_VMA_CHECK;
>   1778                          goto next;
>   1779                  }
>   1780                  mm = vma->vm_mm;
>   1781                  is_target = mm == target_mm && addr == target_addr;
>   1782                  result = find_pmd_or_thp_or_none(mm, addr, &pmd);
>   1783                  if (result != SCAN_SUCCEED)
>   1784                          goto next;
>   1785                  /*
>   1786                   * We need exclusive mmap_lock to retract page table.
>   1787                   *
>   1788                   * We use trylock due to lock inversion: we need to acquire
>   1789                   * mmap_lock while holding page lock. Fault path does it in
>   1790                   * reverse order. Trylock is a way to avoid deadlock.
>   1791                   *
>   1792                   * Also, it's not MADV_COLLAPSE's job to collapse other
>   1793                   * mappings - let khugepaged take care of them later.
>   1794                   */
>   1795                  result = SCAN_PTE_MAPPED_HUGEPAGE;
>   1796                  if ((cc->is_khugepaged || is_target) &&
>   1797                      mmap_write_trylock(mm)) {
>   1798                          /* trylock for the same lock inversion as above */
> > 1799                          if (!vma_try_start_write(vma))
>   1800                                  goto unlock_next;
>   1801
>   1802                          /*
>   1803                           * Re-check whether we have an ->anon_vma, because
>   1804                           * collapse_and_free_pmd() requires that either no
>   1805                           * ->anon_vma exists or the anon_vma is locked.
>   1806                           * We already checked ->anon_vma above, but that check
>   1807                           * is racy because ->anon_vma can be populated under the
>   1808                           * mmap lock in read mode.
>   1809                           */
>   1810                          if (vma->anon_vma) {
>   1811                                  result = SCAN_PAGE_ANON;
>   1812                                  goto unlock_next;
>   1813                          }
>   1814                          /*
>   1815                           * When a vma is registered with uffd-wp, we can't
>   1816                           * recycle the pmd pgtable because there can be pte
>   1817                           * markers installed.  Skip it only, so the rest mm/vma
>   1818                           * can still have the same file mapped hugely, however
>   1819                           * it'll always mapped in small page size for uffd-wp
>   1820                           * registered ranges.
>   1821                           */
>   1822                          if (hpage_collapse_test_exit(mm)) {
>   1823                                  result = SCAN_ANY_PROCESS;
>   1824                                  goto unlock_next;
>   1825                          }
>   1826                          if (userfaultfd_wp(vma)) {
>   1827                                  result = SCAN_PTE_UFFD_WP;
>   1828                                  goto unlock_next;
>   1829                          }
>   1830                          collapse_and_free_pmd(mm, vma, addr, pmd);
>   1831                          if (!cc->is_khugepaged && is_target)
>   1832                                  result = set_huge_pmd(vma, addr, pmd, hpage);
>   1833                          else
>   1834                                  result = SCAN_SUCCEED;
>   1835
>   1836  unlock_next:
>   1837                          mmap_write_unlock(mm);
>   1838                          goto next;
>   1839                  }
>   1840                  /*
>   1841                   * Calling context will handle target mm/addr. Otherwise, let
>   1842                   * khugepaged try again later.
>   1843                   */
>   1844                  if (!is_target) {
>   1845                          khugepaged_add_pte_mapped_thp(mm, addr);
>   1846                          continue;
>   1847                  }
>   1848  next:
>   1849                  if (is_target)
>   1850                          target_result = result;
>   1851          }
>   1852          i_mmap_unlock_write(mapping);
>   1853          return target_result;
>   1854  }
>   1855
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests

  reply	other threads:[~2023-03-04 23:24 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-03 21:32 [PATCH 1/1] mm/khugepaged: fix vm_lock/i_mmap_rwsem inversion in retract_page_tables Suren Baghdasaryan
2023-03-04  9:14 ` kernel test robot
2023-03-04  9:14 ` kernel test robot
2023-03-04 10:57 ` kernel test robot
2023-03-04 23:24   ` Suren Baghdasaryan [this message]
2023-03-04 23:25 ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJuCfpFjWhtzRE1X=J+_JjgJzNKhq-=JT8yTBSTHthwp0pqWZw@mail.gmail.com' \
    --to=surenb@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=lkp@intel.com \
    --cc=llvm@lists.linux.dev \
    --cc=oe-kbuild-all@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.