All of lore.kernel.org
 help / color / mirror / Atom feed
* [linux-next:master 1589/1892] fs/proc/task_mmu.c:143:45: sparse: sparse: incorrect type in argument 1 (different address spaces)
@ 2024-01-25 10:22 kernel test robot
  2024-01-25 21:27 ` Suren Baghdasaryan
  0 siblings, 1 reply; 11+ messages in thread
From: kernel test robot @ 2024-01-25 10:22 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: oe-kbuild-all, Linux Memory Management List, Andrew Morton

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
head:   01af33cc9894b4489fb68fa35c40e9fe85df63dc
commit: 0c30c4cd953025979b7689e49844837f762303ec [1589/1892] mm/maps: read proc/pid/maps under RCU
config: x86_64-randconfig-121-20240125 (https://download.01.org/0day-ci/archive/20240125/202401251829.0m6Eo4LI-lkp@intel.com/config)
compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240125/202401251829.0m6Eo4LI-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202401251829.0m6Eo4LI-lkp@intel.com/

sparse warnings: (new ones prefixed by >>)
>> fs/proc/task_mmu.c:143:45: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct file [noderef] __rcu **f @@     got struct file ** @@
   fs/proc/task_mmu.c:143:45: sparse:     expected struct file [noderef] __rcu **f
   fs/proc/task_mmu.c:143:45: sparse:     got struct file **
   fs/proc/task_mmu.c: note: in included file (through include/linux/rbtree.h, include/linux/mm_types.h, include/linux/mmzone.h, ...):
   include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'get_vma_snapshot' - unexpected unlock
   fs/proc/task_mmu.c:264:22: sparse: sparse: context imbalance in 'm_start' - different lock contexts for basic block
   include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'm_stop' - unexpected unlock
   include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'smaps_pte_range' - unexpected unlock
   include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'clear_refs_pte_range' - unexpected unlock
   include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'pagemap_pmd_range' - unexpected unlock
   include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'pagemap_scan_pmd_entry' - unexpected unlock
   fs/proc/task_mmu.c: note: in included file (through arch/x86/include/asm/uaccess.h, include/linux/uaccess.h, include/linux/sched/task.h, ...):
   arch/x86/include/asm/uaccess_64.h:88:24: sparse: sparse: cast removes address space '__user' of expression
   arch/x86/include/asm/uaccess_64.h:88:24: sparse: sparse: cast removes address space '__user' of expression

vim +143 fs/proc/task_mmu.c

   132	
   133	/*
   134	 * Take VMA snapshot and pin vm_file and anon_name as they are used by
   135	 * show_map_vma.
   136	 */
   137	static int get_vma_snapshot(struct proc_maps_private *priv, struct vm_area_struct *vma)
   138	{
   139		struct vm_area_struct *copy = &priv->vma_copy;
   140		int ret = -EAGAIN;
   141	
   142		memcpy(copy, vma, sizeof(*vma));
 > 143		if (copy->vm_file && !get_file_rcu(&copy->vm_file))
   144			goto out;
   145	
   146		if (!anon_vma_name_get_if_valid(copy))
   147			goto put_file;
   148	
   149		if (priv->mm_wr_seq == mmap_write_seq_read(priv->mm))
   150			return 0;
   151	
   152		/* Address space got modified, vma might be stale. Wait and retry. */
   153		rcu_read_unlock();
   154		ret = mmap_read_lock_killable(priv->mm);
   155		mmap_write_seq_record(priv->mm, &priv->mm_wr_seq);
   156		mmap_read_unlock(priv->mm);
   157		rcu_read_lock();
   158	
   159		if (!ret)
   160			ret = -EAGAIN; /* no other errors, ok to retry */
   161	
   162		anon_vma_name_put_if_valid(copy);
   163	put_file:
   164		if (copy->vm_file)
   165			fput(copy->vm_file);
   166	out:
   167		return ret;
   168	}
   169	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-next:master 1589/1892] fs/proc/task_mmu.c:143:45: sparse: sparse: incorrect type in argument 1 (different address spaces)
  2024-01-25 10:22 [linux-next:master 1589/1892] fs/proc/task_mmu.c:143:45: sparse: sparse: incorrect type in argument 1 (different address spaces) kernel test robot
@ 2024-01-25 21:27 ` Suren Baghdasaryan
  2024-01-25 22:44   ` Paul E. McKenney
  0 siblings, 1 reply; 11+ messages in thread
From: Suren Baghdasaryan @ 2024-01-25 21:27 UTC (permalink / raw)
  To: kernel test robot, Matthew Wilcox, Paul E . McKenney
  Cc: oe-kbuild-all, Linux Memory Management List, Andrew Morton

On Thu, Jan 25, 2024 at 2:23 AM kernel test robot <lkp@intel.com> wrote:
>
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> head:   01af33cc9894b4489fb68fa35c40e9fe85df63dc
> commit: 0c30c4cd953025979b7689e49844837f762303ec [1589/1892] mm/maps: read proc/pid/maps under RCU
> config: x86_64-randconfig-121-20240125 (https://download.01.org/0day-ci/archive/20240125/202401251829.0m6Eo4LI-lkp@intel.com/config)
> compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240125/202401251829.0m6Eo4LI-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202401251829.0m6Eo4LI-lkp@intel.com/
>
> sparse warnings: (new ones prefixed by >>)
> >> fs/proc/task_mmu.c:143:45: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct file [noderef] __rcu **f @@     got struct file ** @@

Uh, this is a problem.
I missed that get_file_rcu() is used only with mm->exe_file and
vma->vm_file is not really RCU-safe. It's freed via a call to fput()
which schedules its freeing using schedule_delayed_work(..., 1) but I
don't think that constitutes RCU grace period. Paul, Matthew, could
you please confirm? In the meantime I'm going to ask Andrew to remove
my patchset from mm-unstable to be safe.


>    fs/proc/task_mmu.c:143:45: sparse:     expected struct file [noderef] __rcu **f
>    fs/proc/task_mmu.c:143:45: sparse:     got struct file **
>    fs/proc/task_mmu.c: note: in included file (through include/linux/rbtree.h, include/linux/mm_types.h, include/linux/mmzone.h, ...):
>    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'get_vma_snapshot' - unexpected unlock
>    fs/proc/task_mmu.c:264:22: sparse: sparse: context imbalance in 'm_start' - different lock contexts for basic block
>    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'm_stop' - unexpected unlock
>    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'smaps_pte_range' - unexpected unlock
>    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'clear_refs_pte_range' - unexpected unlock
>    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'pagemap_pmd_range' - unexpected unlock
>    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'pagemap_scan_pmd_entry' - unexpected unlock
>    fs/proc/task_mmu.c: note: in included file (through arch/x86/include/asm/uaccess.h, include/linux/uaccess.h, include/linux/sched/task.h, ...):
>    arch/x86/include/asm/uaccess_64.h:88:24: sparse: sparse: cast removes address space '__user' of expression
>    arch/x86/include/asm/uaccess_64.h:88:24: sparse: sparse: cast removes address space '__user' of expression
>
> vim +143 fs/proc/task_mmu.c
>
>    132
>    133  /*
>    134   * Take VMA snapshot and pin vm_file and anon_name as they are used by
>    135   * show_map_vma.
>    136   */
>    137  static int get_vma_snapshot(struct proc_maps_private *priv, struct vm_area_struct *vma)
>    138  {
>    139          struct vm_area_struct *copy = &priv->vma_copy;
>    140          int ret = -EAGAIN;
>    141
>    142          memcpy(copy, vma, sizeof(*vma));
>  > 143          if (copy->vm_file && !get_file_rcu(&copy->vm_file))
>    144                  goto out;
>    145
>    146          if (!anon_vma_name_get_if_valid(copy))
>    147                  goto put_file;
>    148
>    149          if (priv->mm_wr_seq == mmap_write_seq_read(priv->mm))
>    150                  return 0;
>    151
>    152          /* Address space got modified, vma might be stale. Wait and retry. */
>    153          rcu_read_unlock();
>    154          ret = mmap_read_lock_killable(priv->mm);
>    155          mmap_write_seq_record(priv->mm, &priv->mm_wr_seq);
>    156          mmap_read_unlock(priv->mm);
>    157          rcu_read_lock();
>    158
>    159          if (!ret)
>    160                  ret = -EAGAIN; /* no other errors, ok to retry */
>    161
>    162          anon_vma_name_put_if_valid(copy);
>    163  put_file:
>    164          if (copy->vm_file)
>    165                  fput(copy->vm_file);
>    166  out:
>    167          return ret;
>    168  }
>    169
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-next:master 1589/1892] fs/proc/task_mmu.c:143:45: sparse: sparse: incorrect type in argument 1 (different address spaces)
  2024-01-25 21:27 ` Suren Baghdasaryan
@ 2024-01-25 22:44   ` Paul E. McKenney
  2024-01-25 23:17     ` Suren Baghdasaryan
  0 siblings, 1 reply; 11+ messages in thread
From: Paul E. McKenney @ 2024-01-25 22:44 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: kernel test robot, Matthew Wilcox, oe-kbuild-all,
	Linux Memory Management List, Andrew Morton

On Thu, Jan 25, 2024 at 01:27:34PM -0800, Suren Baghdasaryan wrote:
> On Thu, Jan 25, 2024 at 2:23 AM kernel test robot <lkp@intel.com> wrote:
> >
> > tree:   https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> > head:   01af33cc9894b4489fb68fa35c40e9fe85df63dc
> > commit: 0c30c4cd953025979b7689e49844837f762303ec [1589/1892] mm/maps: read proc/pid/maps under RCU
> > config: x86_64-randconfig-121-20240125 (https://download.01.org/0day-ci/archive/20240125/202401251829.0m6Eo4LI-lkp@intel.com/config)
> > compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
> > reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240125/202401251829.0m6Eo4LI-lkp@intel.com/reproduce)
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <lkp@intel.com>
> > | Closes: https://lore.kernel.org/oe-kbuild-all/202401251829.0m6Eo4LI-lkp@intel.com/
> >
> > sparse warnings: (new ones prefixed by >>)
> > >> fs/proc/task_mmu.c:143:45: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct file [noderef] __rcu **f @@     got struct file ** @@
> 
> Uh, this is a problem.
> I missed that get_file_rcu() is used only with mm->exe_file and
> vma->vm_file is not really RCU-safe. It's freed via a call to fput()
> which schedules its freeing using schedule_delayed_work(..., 1) but I
> don't think that constitutes RCU grace period. Paul, Matthew, could
> you please confirm? In the meantime I'm going to ask Andrew to remove
> my patchset from mm-unstable to be safe.

Sadly, no, schedule_delayed_work() does not imply an RCU grace period.

There is a queue_rcu_work() that schedules work after a grace period,
which could be combined with a timer to get the delay.

Another approach would be to use get_state_synchronize_rcu() before
the schedule_delayed_work() in fput(), then do cond_synchronize_rcu()
in delayed_fput().  This would require adding an unsigned long to
struct file to keep track of which grace period a given struct file
needed to wait for.

Perhaps something like this:

------------------------------------------------------------------------

void fput(struct file *file)
{
	if (atomic_long_dec_and_test(&file->f_count)) {
		struct task_struct *task = current;

		if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) {
			init_task_work(&file->f_rcuhead, ____fput);
			if (!task_work_add(task, &file->f_rcuhead, TWA_RESUME))
				return;
			/*
			 * After this task has run exit_task_work(),
			 * task_work_add() will fail.  Fall through to delayed
			 * fput to avoid leaking *file.
			 */
		}

		file->f_rcu_seq = get_state_synchronize_rcu();
		if (llist_add(&file->f_llist, &delayed_fput_list))
			schedule_delayed_work(&delayed_fput_work, 1);
	}
}

------------------------------------------------------------------------

And this:

------------------------------------------------------------------------

static void delayed_fput(struct work_struct *unused)
{
	struct llist_node *node = llist_del_all(&delayed_fput_list);
	struct file *f, *t;

	llist_for_each_entry_safe(f, t, node, f_llist) {
		cond_synchronize_rcu(f->f_rcu_seq);
		__fput(f);
	}
}

------------------------------------------------------------------------

Note that if you called fput() on a long sequence of struct file
structures, the cond_synchronize_rcu() would be a near-noop almost all the
time, actually blocking at most about every once per every few jiffies.
After all, once a grace period has been waited for, it covers all of
the struct file structures that were passed to fput() during a given
RCU grace period.

Still, it would add the occasional delay.  And it would increase the
size of struct file, though there are workarounds for that, if size
is an issue.

Thoughts?

							Thanx, Paul

> >    fs/proc/task_mmu.c:143:45: sparse:     expected struct file [noderef] __rcu **f
> >    fs/proc/task_mmu.c:143:45: sparse:     got struct file **
> >    fs/proc/task_mmu.c: note: in included file (through include/linux/rbtree.h, include/linux/mm_types.h, include/linux/mmzone.h, ...):
> >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'get_vma_snapshot' - unexpected unlock
> >    fs/proc/task_mmu.c:264:22: sparse: sparse: context imbalance in 'm_start' - different lock contexts for basic block
> >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'm_stop' - unexpected unlock
> >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'smaps_pte_range' - unexpected unlock
> >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'clear_refs_pte_range' - unexpected unlock
> >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'pagemap_pmd_range' - unexpected unlock
> >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'pagemap_scan_pmd_entry' - unexpected unlock
> >    fs/proc/task_mmu.c: note: in included file (through arch/x86/include/asm/uaccess.h, include/linux/uaccess.h, include/linux/sched/task.h, ...):
> >    arch/x86/include/asm/uaccess_64.h:88:24: sparse: sparse: cast removes address space '__user' of expression
> >    arch/x86/include/asm/uaccess_64.h:88:24: sparse: sparse: cast removes address space '__user' of expression
> >
> > vim +143 fs/proc/task_mmu.c
> >
> >    132
> >    133  /*
> >    134   * Take VMA snapshot and pin vm_file and anon_name as they are used by
> >    135   * show_map_vma.
> >    136   */
> >    137  static int get_vma_snapshot(struct proc_maps_private *priv, struct vm_area_struct *vma)
> >    138  {
> >    139          struct vm_area_struct *copy = &priv->vma_copy;
> >    140          int ret = -EAGAIN;
> >    141
> >    142          memcpy(copy, vma, sizeof(*vma));
> >  > 143          if (copy->vm_file && !get_file_rcu(&copy->vm_file))
> >    144                  goto out;
> >    145
> >    146          if (!anon_vma_name_get_if_valid(copy))
> >    147                  goto put_file;
> >    148
> >    149          if (priv->mm_wr_seq == mmap_write_seq_read(priv->mm))
> >    150                  return 0;
> >    151
> >    152          /* Address space got modified, vma might be stale. Wait and retry. */
> >    153          rcu_read_unlock();
> >    154          ret = mmap_read_lock_killable(priv->mm);
> >    155          mmap_write_seq_record(priv->mm, &priv->mm_wr_seq);
> >    156          mmap_read_unlock(priv->mm);
> >    157          rcu_read_lock();
> >    158
> >    159          if (!ret)
> >    160                  ret = -EAGAIN; /* no other errors, ok to retry */
> >    161
> >    162          anon_vma_name_put_if_valid(copy);
> >    163  put_file:
> >    164          if (copy->vm_file)
> >    165                  fput(copy->vm_file);
> >    166  out:
> >    167          return ret;
> >    168  }
> >    169
> >
> > --
> > 0-DAY CI Kernel Test Service
> > https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-next:master 1589/1892] fs/proc/task_mmu.c:143:45: sparse: sparse: incorrect type in argument 1 (different address spaces)
  2024-01-25 22:44   ` Paul E. McKenney
@ 2024-01-25 23:17     ` Suren Baghdasaryan
  2024-01-26  0:35       ` Paul E. McKenney
  0 siblings, 1 reply; 11+ messages in thread
From: Suren Baghdasaryan @ 2024-01-25 23:17 UTC (permalink / raw)
  To: paulmck
  Cc: kernel test robot, Matthew Wilcox, oe-kbuild-all,
	Linux Memory Management List, Andrew Morton

On Thu, Jan 25, 2024 at 2:44 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Thu, Jan 25, 2024 at 01:27:34PM -0800, Suren Baghdasaryan wrote:
> > On Thu, Jan 25, 2024 at 2:23 AM kernel test robot <lkp@intel.com> wrote:
> > >
> > > tree:   https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> > > head:   01af33cc9894b4489fb68fa35c40e9fe85df63dc
> > > commit: 0c30c4cd953025979b7689e49844837f762303ec [1589/1892] mm/maps: read proc/pid/maps under RCU
> > > config: x86_64-randconfig-121-20240125 (https://download.01.org/0day-ci/archive/20240125/202401251829.0m6Eo4LI-lkp@intel.com/config)
> > > compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
> > > reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240125/202401251829.0m6Eo4LI-lkp@intel.com/reproduce)
> > >
> > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > the same patch/commit), kindly add following tags
> > > | Reported-by: kernel test robot <lkp@intel.com>
> > > | Closes: https://lore.kernel.org/oe-kbuild-all/202401251829.0m6Eo4LI-lkp@intel.com/
> > >
> > > sparse warnings: (new ones prefixed by >>)
> > > >> fs/proc/task_mmu.c:143:45: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct file [noderef] __rcu **f @@     got struct file ** @@
> >
> > Uh, this is a problem.
> > I missed that get_file_rcu() is used only with mm->exe_file and
> > vma->vm_file is not really RCU-safe. It's freed via a call to fput()
> > which schedules its freeing using schedule_delayed_work(..., 1) but I
> > don't think that constitutes RCU grace period. Paul, Matthew, could
> > you please confirm? In the meantime I'm going to ask Andrew to remove
> > my patchset from mm-unstable to be safe.
>
> Sadly, no, schedule_delayed_work() does not imply an RCU grace period.
>
> There is a queue_rcu_work() that schedules work after a grace period,
> which could be combined with a timer to get the delay.
>
> Another approach would be to use get_state_synchronize_rcu() before
> the schedule_delayed_work() in fput(), then do cond_synchronize_rcu()
> in delayed_fput().  This would require adding an unsigned long to
> struct file to keep track of which grace period a given struct file
> needed to wait for.
>
> Perhaps something like this:
>
> ------------------------------------------------------------------------
>
> void fput(struct file *file)
> {
>         if (atomic_long_dec_and_test(&file->f_count)) {
>                 struct task_struct *task = current;
>
>                 if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) {
>                         init_task_work(&file->f_rcuhead, ____fput);
>                         if (!task_work_add(task, &file->f_rcuhead, TWA_RESUME))
>                                 return;
>                         /*
>                          * After this task has run exit_task_work(),
>                          * task_work_add() will fail.  Fall through to delayed
>                          * fput to avoid leaking *file.
>                          */
>                 }
>
>                 file->f_rcu_seq = get_state_synchronize_rcu();
>                 if (llist_add(&file->f_llist, &delayed_fput_list))
>                         schedule_delayed_work(&delayed_fput_work, 1);
>         }
> }
>
> ------------------------------------------------------------------------
>
> And this:
>
> ------------------------------------------------------------------------
>
> static void delayed_fput(struct work_struct *unused)
> {
>         struct llist_node *node = llist_del_all(&delayed_fput_list);
>         struct file *f, *t;
>
>         llist_for_each_entry_safe(f, t, node, f_llist) {
>                 cond_synchronize_rcu(f->f_rcu_seq);
>                 __fput(f);
>         }
> }
>
> ------------------------------------------------------------------------
>
> Note that if you called fput() on a long sequence of struct file
> structures, the cond_synchronize_rcu() would be a near-noop almost all the
> time, actually blocking at most about every once per every few jiffies.
> After all, once a grace period has been waited for, it covers all of
> the struct file structures that were passed to fput() during a given
> RCU grace period.
>
> Still, it would add the occasional delay.  And it would increase the
> size of struct file, though there are workarounds for that, if size
> is an issue.
>
> Thoughts?

Thanks for the suggestion, Paul. I'm worried about this occasional
delay but otherwise this seems like a nice and simple approach. Do you
guys think that making *all* files RCU-safe with this approach is
warranted? For my particular case I need only vma->vm_file to be
RCU-safe but maybe there are other cases which would benefit from
this?

>
>                                                         Thanx, Paul
>
> > >    fs/proc/task_mmu.c:143:45: sparse:     expected struct file [noderef] __rcu **f
> > >    fs/proc/task_mmu.c:143:45: sparse:     got struct file **
> > >    fs/proc/task_mmu.c: note: in included file (through include/linux/rbtree.h, include/linux/mm_types.h, include/linux/mmzone.h, ...):
> > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'get_vma_snapshot' - unexpected unlock
> > >    fs/proc/task_mmu.c:264:22: sparse: sparse: context imbalance in 'm_start' - different lock contexts for basic block
> > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'm_stop' - unexpected unlock
> > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'smaps_pte_range' - unexpected unlock
> > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'clear_refs_pte_range' - unexpected unlock
> > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'pagemap_pmd_range' - unexpected unlock
> > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'pagemap_scan_pmd_entry' - unexpected unlock
> > >    fs/proc/task_mmu.c: note: in included file (through arch/x86/include/asm/uaccess.h, include/linux/uaccess.h, include/linux/sched/task.h, ...):
> > >    arch/x86/include/asm/uaccess_64.h:88:24: sparse: sparse: cast removes address space '__user' of expression
> > >    arch/x86/include/asm/uaccess_64.h:88:24: sparse: sparse: cast removes address space '__user' of expression
> > >
> > > vim +143 fs/proc/task_mmu.c
> > >
> > >    132
> > >    133  /*
> > >    134   * Take VMA snapshot and pin vm_file and anon_name as they are used by
> > >    135   * show_map_vma.
> > >    136   */
> > >    137  static int get_vma_snapshot(struct proc_maps_private *priv, struct vm_area_struct *vma)
> > >    138  {
> > >    139          struct vm_area_struct *copy = &priv->vma_copy;
> > >    140          int ret = -EAGAIN;
> > >    141
> > >    142          memcpy(copy, vma, sizeof(*vma));
> > >  > 143          if (copy->vm_file && !get_file_rcu(&copy->vm_file))
> > >    144                  goto out;
> > >    145
> > >    146          if (!anon_vma_name_get_if_valid(copy))
> > >    147                  goto put_file;
> > >    148
> > >    149          if (priv->mm_wr_seq == mmap_write_seq_read(priv->mm))
> > >    150                  return 0;
> > >    151
> > >    152          /* Address space got modified, vma might be stale. Wait and retry. */
> > >    153          rcu_read_unlock();
> > >    154          ret = mmap_read_lock_killable(priv->mm);
> > >    155          mmap_write_seq_record(priv->mm, &priv->mm_wr_seq);
> > >    156          mmap_read_unlock(priv->mm);
> > >    157          rcu_read_lock();
> > >    158
> > >    159          if (!ret)
> > >    160                  ret = -EAGAIN; /* no other errors, ok to retry */
> > >    161
> > >    162          anon_vma_name_put_if_valid(copy);
> > >    163  put_file:
> > >    164          if (copy->vm_file)
> > >    165                  fput(copy->vm_file);
> > >    166  out:
> > >    167          return ret;
> > >    168  }
> > >    169
> > >
> > > --
> > > 0-DAY CI Kernel Test Service
> > > https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-next:master 1589/1892] fs/proc/task_mmu.c:143:45: sparse: sparse: incorrect type in argument 1 (different address spaces)
  2024-01-25 23:17     ` Suren Baghdasaryan
@ 2024-01-26  0:35       ` Paul E. McKenney
  2024-01-26  2:24         ` Suren Baghdasaryan
  0 siblings, 1 reply; 11+ messages in thread
From: Paul E. McKenney @ 2024-01-26  0:35 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: kernel test robot, Matthew Wilcox, oe-kbuild-all,
	Linux Memory Management List, Andrew Morton

On Thu, Jan 25, 2024 at 03:17:17PM -0800, Suren Baghdasaryan wrote:
> On Thu, Jan 25, 2024 at 2:44 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Thu, Jan 25, 2024 at 01:27:34PM -0800, Suren Baghdasaryan wrote:
> > > On Thu, Jan 25, 2024 at 2:23 AM kernel test robot <lkp@intel.com> wrote:
> > > >
> > > > tree:   https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> > > > head:   01af33cc9894b4489fb68fa35c40e9fe85df63dc
> > > > commit: 0c30c4cd953025979b7689e49844837f762303ec [1589/1892] mm/maps: read proc/pid/maps under RCU
> > > > config: x86_64-randconfig-121-20240125 (https://download.01.org/0day-ci/archive/20240125/202401251829.0m6Eo4LI-lkp@intel.com/config)
> > > > compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
> > > > reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240125/202401251829.0m6Eo4LI-lkp@intel.com/reproduce)
> > > >
> > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > > the same patch/commit), kindly add following tags
> > > > | Reported-by: kernel test robot <lkp@intel.com>
> > > > | Closes: https://lore.kernel.org/oe-kbuild-all/202401251829.0m6Eo4LI-lkp@intel.com/
> > > >
> > > > sparse warnings: (new ones prefixed by >>)
> > > > >> fs/proc/task_mmu.c:143:45: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct file [noderef] __rcu **f @@     got struct file ** @@
> > >
> > > Uh, this is a problem.
> > > I missed that get_file_rcu() is used only with mm->exe_file and
> > > vma->vm_file is not really RCU-safe. It's freed via a call to fput()
> > > which schedules its freeing using schedule_delayed_work(..., 1) but I
> > > don't think that constitutes RCU grace period. Paul, Matthew, could
> > > you please confirm? In the meantime I'm going to ask Andrew to remove
> > > my patchset from mm-unstable to be safe.
> >
> > Sadly, no, schedule_delayed_work() does not imply an RCU grace period.
> >
> > There is a queue_rcu_work() that schedules work after a grace period,
> > which could be combined with a timer to get the delay.
> >
> > Another approach would be to use get_state_synchronize_rcu() before
> > the schedule_delayed_work() in fput(), then do cond_synchronize_rcu()
> > in delayed_fput().  This would require adding an unsigned long to
> > struct file to keep track of which grace period a given struct file
> > needed to wait for.
> >
> > Perhaps something like this:
> >
> > ------------------------------------------------------------------------
> >
> > void fput(struct file *file)
> > {
> >         if (atomic_long_dec_and_test(&file->f_count)) {
> >                 struct task_struct *task = current;
> >
> >                 if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) {
> >                         init_task_work(&file->f_rcuhead, ____fput);
> >                         if (!task_work_add(task, &file->f_rcuhead, TWA_RESUME))
> >                                 return;
> >                         /*
> >                          * After this task has run exit_task_work(),
> >                          * task_work_add() will fail.  Fall through to delayed
> >                          * fput to avoid leaking *file.
> >                          */
> >                 }
> >
> >                 file->f_rcu_seq = get_state_synchronize_rcu();
> >                 if (llist_add(&file->f_llist, &delayed_fput_list))
> >                         schedule_delayed_work(&delayed_fput_work, 1);
> >         }
> > }
> >
> > ------------------------------------------------------------------------
> >
> > And this:
> >
> > ------------------------------------------------------------------------
> >
> > static void delayed_fput(struct work_struct *unused)
> > {
> >         struct llist_node *node = llist_del_all(&delayed_fput_list);
> >         struct file *f, *t;
> >
> >         llist_for_each_entry_safe(f, t, node, f_llist) {
> >                 cond_synchronize_rcu(f->f_rcu_seq);
> >                 __fput(f);
> >         }
> > }
> >
> > ------------------------------------------------------------------------
> >
> > Note that if you called fput() on a long sequence of struct file
> > structures, the cond_synchronize_rcu() would be a near-noop almost all the
> > time, actually blocking at most about every once per every few jiffies.
> > After all, once a grace period has been waited for, it covers all of
> > the struct file structures that were passed to fput() during a given
> > RCU grace period.
> >
> > Still, it would add the occasional delay.  And it would increase the
> > size of struct file, though there are workarounds for that, if size
> > is an issue.
> >
> > Thoughts?
> 
> Thanks for the suggestion, Paul. I'm worried about this occasional
> delay but otherwise this seems like a nice and simple approach.

One potential saving grace is that the more heavily loaded the mechanism,
the smaller a fraction of the cond_synchronize_rcu() calls will do
a delay.

>                                                                 Do you
> guys think that making *all* files RCU-safe with this approach is
> warranted? For my particular case I need only vma->vm_file to be
> RCU-safe but maybe there are other cases which would benefit from
> this?

To this, I can only give an unqualified "I don't know".  :-(

But if there is some condition that can be sampled on a per-file-structure
basis, you could use that to invoke cond_synchronize_rcu() only when
needed.  Or send only those file structures that need the extra delay
through queue_rcu_work(), perhaps by accumulating a list of them.

                                                        Thanx, Paul

> > > >    fs/proc/task_mmu.c:143:45: sparse:     expected struct file [noderef] __rcu **f
> > > >    fs/proc/task_mmu.c:143:45: sparse:     got struct file **
> > > >    fs/proc/task_mmu.c: note: in included file (through include/linux/rbtree.h, include/linux/mm_types.h, include/linux/mmzone.h, ...):
> > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'get_vma_snapshot' - unexpected unlock
> > > >    fs/proc/task_mmu.c:264:22: sparse: sparse: context imbalance in 'm_start' - different lock contexts for basic block
> > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'm_stop' - unexpected unlock
> > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'smaps_pte_range' - unexpected unlock
> > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'clear_refs_pte_range' - unexpected unlock
> > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'pagemap_pmd_range' - unexpected unlock
> > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'pagemap_scan_pmd_entry' - unexpected unlock
> > > >    fs/proc/task_mmu.c: note: in included file (through arch/x86/include/asm/uaccess.h, include/linux/uaccess.h, include/linux/sched/task.h, ...):
> > > >    arch/x86/include/asm/uaccess_64.h:88:24: sparse: sparse: cast removes address space '__user' of expression
> > > >    arch/x86/include/asm/uaccess_64.h:88:24: sparse: sparse: cast removes address space '__user' of expression
> > > >
> > > > vim +143 fs/proc/task_mmu.c
> > > >
> > > >    132
> > > >    133  /*
> > > >    134   * Take VMA snapshot and pin vm_file and anon_name as they are used by
> > > >    135   * show_map_vma.
> > > >    136   */
> > > >    137  static int get_vma_snapshot(struct proc_maps_private *priv, struct vm_area_struct *vma)
> > > >    138  {
> > > >    139          struct vm_area_struct *copy = &priv->vma_copy;
> > > >    140          int ret = -EAGAIN;
> > > >    141
> > > >    142          memcpy(copy, vma, sizeof(*vma));
> > > >  > 143          if (copy->vm_file && !get_file_rcu(&copy->vm_file))
> > > >    144                  goto out;
> > > >    145
> > > >    146          if (!anon_vma_name_get_if_valid(copy))
> > > >    147                  goto put_file;
> > > >    148
> > > >    149          if (priv->mm_wr_seq == mmap_write_seq_read(priv->mm))
> > > >    150                  return 0;
> > > >    151
> > > >    152          /* Address space got modified, vma might be stale. Wait and retry. */
> > > >    153          rcu_read_unlock();
> > > >    154          ret = mmap_read_lock_killable(priv->mm);
> > > >    155          mmap_write_seq_record(priv->mm, &priv->mm_wr_seq);
> > > >    156          mmap_read_unlock(priv->mm);
> > > >    157          rcu_read_lock();
> > > >    158
> > > >    159          if (!ret)
> > > >    160                  ret = -EAGAIN; /* no other errors, ok to retry */
> > > >    161
> > > >    162          anon_vma_name_put_if_valid(copy);
> > > >    163  put_file:
> > > >    164          if (copy->vm_file)
> > > >    165                  fput(copy->vm_file);
> > > >    166  out:
> > > >    167          return ret;
> > > >    168  }
> > > >    169
> > > >
> > > > --
> > > > 0-DAY CI Kernel Test Service
> > > > https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-next:master 1589/1892] fs/proc/task_mmu.c:143:45: sparse: sparse: incorrect type in argument 1 (different address spaces)
  2024-01-26  0:35       ` Paul E. McKenney
@ 2024-01-26  2:24         ` Suren Baghdasaryan
  2024-01-26 10:22           ` Paul E. McKenney
  0 siblings, 1 reply; 11+ messages in thread
From: Suren Baghdasaryan @ 2024-01-26  2:24 UTC (permalink / raw)
  To: paulmck
  Cc: kernel test robot, Matthew Wilcox, oe-kbuild-all,
	Linux Memory Management List, Andrew Morton

On Thu, Jan 25, 2024 at 4:35 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Thu, Jan 25, 2024 at 03:17:17PM -0800, Suren Baghdasaryan wrote:
> > On Thu, Jan 25, 2024 at 2:44 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > >
> > > On Thu, Jan 25, 2024 at 01:27:34PM -0800, Suren Baghdasaryan wrote:
> > > > On Thu, Jan 25, 2024 at 2:23 AM kernel test robot <lkp@intel.com> wrote:
> > > > >
> > > > > tree:   https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> > > > > head:   01af33cc9894b4489fb68fa35c40e9fe85df63dc
> > > > > commit: 0c30c4cd953025979b7689e49844837f762303ec [1589/1892] mm/maps: read proc/pid/maps under RCU
> > > > > config: x86_64-randconfig-121-20240125 (https://download.01.org/0day-ci/archive/20240125/202401251829.0m6Eo4LI-lkp@intel.com/config)
> > > > > compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
> > > > > reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240125/202401251829.0m6Eo4LI-lkp@intel.com/reproduce)
> > > > >
> > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > > > the same patch/commit), kindly add following tags
> > > > > | Reported-by: kernel test robot <lkp@intel.com>
> > > > > | Closes: https://lore.kernel.org/oe-kbuild-all/202401251829.0m6Eo4LI-lkp@intel.com/
> > > > >
> > > > > sparse warnings: (new ones prefixed by >>)
> > > > > >> fs/proc/task_mmu.c:143:45: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct file [noderef] __rcu **f @@     got struct file ** @@
> > > >
> > > > Uh, this is a problem.
> > > > I missed that get_file_rcu() is used only with mm->exe_file and
> > > > vma->vm_file is not really RCU-safe. It's freed via a call to fput()
> > > > which schedules its freeing using schedule_delayed_work(..., 1) but I
> > > > don't think that constitutes RCU grace period. Paul, Matthew, could
> > > > you please confirm? In the meantime I'm going to ask Andrew to remove
> > > > my patchset from mm-unstable to be safe.
> > >
> > > Sadly, no, schedule_delayed_work() does not imply an RCU grace period.
> > >
> > > There is a queue_rcu_work() that schedules work after a grace period,
> > > which could be combined with a timer to get the delay.
> > >
> > > Another approach would be to use get_state_synchronize_rcu() before
> > > the schedule_delayed_work() in fput(), then do cond_synchronize_rcu()
> > > in delayed_fput().  This would require adding an unsigned long to
> > > struct file to keep track of which grace period a given struct file
> > > needed to wait for.
> > >
> > > Perhaps something like this:
> > >
> > > ------------------------------------------------------------------------
> > >
> > > void fput(struct file *file)
> > > {
> > >         if (atomic_long_dec_and_test(&file->f_count)) {
> > >                 struct task_struct *task = current;
> > >
> > >                 if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) {
> > >                         init_task_work(&file->f_rcuhead, ____fput);
> > >                         if (!task_work_add(task, &file->f_rcuhead, TWA_RESUME))
> > >                                 return;
> > >                         /*
> > >                          * After this task has run exit_task_work(),
> > >                          * task_work_add() will fail.  Fall through to delayed
> > >                          * fput to avoid leaking *file.
> > >                          */
> > >                 }
> > >
> > >                 file->f_rcu_seq = get_state_synchronize_rcu();
> > >                 if (llist_add(&file->f_llist, &delayed_fput_list))
> > >                         schedule_delayed_work(&delayed_fput_work, 1);
> > >         }
> > > }
> > >
> > > ------------------------------------------------------------------------
> > >
> > > And this:
> > >
> > > ------------------------------------------------------------------------
> > >
> > > static void delayed_fput(struct work_struct *unused)
> > > {
> > >         struct llist_node *node = llist_del_all(&delayed_fput_list);
> > >         struct file *f, *t;
> > >
> > >         llist_for_each_entry_safe(f, t, node, f_llist) {
> > >                 cond_synchronize_rcu(f->f_rcu_seq);
> > >                 __fput(f);
> > >         }
> > > }
> > >
> > > ------------------------------------------------------------------------
> > >
> > > Note that if you called fput() on a long sequence of struct file
> > > structures, the cond_synchronize_rcu() would be a near-noop almost all the
> > > time, actually blocking at most about every once per every few jiffies.
> > > After all, once a grace period has been waited for, it covers all of
> > > the struct file structures that were passed to fput() during a given
> > > RCU grace period.
> > >
> > > Still, it would add the occasional delay.  And it would increase the
> > > size of struct file, though there are workarounds for that, if size
> > > is an issue.
> > >
> > > Thoughts?
> >
> > Thanks for the suggestion, Paul. I'm worried about this occasional
> > delay but otherwise this seems like a nice and simple approach.
>
> One potential saving grace is that the more heavily loaded the mechanism,
> the smaller a fraction of the cond_synchronize_rcu() calls will do
> a delay.
>
> >                                                                 Do you
> > guys think that making *all* files RCU-safe with this approach is
> > warranted? For my particular case I need only vma->vm_file to be
> > RCU-safe but maybe there are other cases which would benefit from
> > this?
>
> To this, I can only give an unqualified "I don't know".  :-(
>
> But if there is some condition that can be sampled on a per-file-structure
> basis, you could use that to invoke cond_synchronize_rcu() only when
> needed.  Or send only those file structures that need the extra delay
> through queue_rcu_work(), perhaps by accumulating a list of them.

Thanks Paul! You gave me enough food for thought. Let me see if I can
come up with something usable.
Cheers,
Suren.

>
>                                                         Thanx, Paul
>
> > > > >    fs/proc/task_mmu.c:143:45: sparse:     expected struct file [noderef] __rcu **f
> > > > >    fs/proc/task_mmu.c:143:45: sparse:     got struct file **
> > > > >    fs/proc/task_mmu.c: note: in included file (through include/linux/rbtree.h, include/linux/mm_types.h, include/linux/mmzone.h, ...):
> > > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'get_vma_snapshot' - unexpected unlock
> > > > >    fs/proc/task_mmu.c:264:22: sparse: sparse: context imbalance in 'm_start' - different lock contexts for basic block
> > > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'm_stop' - unexpected unlock
> > > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'smaps_pte_range' - unexpected unlock
> > > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'clear_refs_pte_range' - unexpected unlock
> > > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'pagemap_pmd_range' - unexpected unlock
> > > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'pagemap_scan_pmd_entry' - unexpected unlock
> > > > >    fs/proc/task_mmu.c: note: in included file (through arch/x86/include/asm/uaccess.h, include/linux/uaccess.h, include/linux/sched/task.h, ...):
> > > > >    arch/x86/include/asm/uaccess_64.h:88:24: sparse: sparse: cast removes address space '__user' of expression
> > > > >    arch/x86/include/asm/uaccess_64.h:88:24: sparse: sparse: cast removes address space '__user' of expression
> > > > >
> > > > > vim +143 fs/proc/task_mmu.c
> > > > >
> > > > >    132
> > > > >    133  /*
> > > > >    134   * Take VMA snapshot and pin vm_file and anon_name as they are used by
> > > > >    135   * show_map_vma.
> > > > >    136   */
> > > > >    137  static int get_vma_snapshot(struct proc_maps_private *priv, struct vm_area_struct *vma)
> > > > >    138  {
> > > > >    139          struct vm_area_struct *copy = &priv->vma_copy;
> > > > >    140          int ret = -EAGAIN;
> > > > >    141
> > > > >    142          memcpy(copy, vma, sizeof(*vma));
> > > > >  > 143          if (copy->vm_file && !get_file_rcu(&copy->vm_file))
> > > > >    144                  goto out;
> > > > >    145
> > > > >    146          if (!anon_vma_name_get_if_valid(copy))
> > > > >    147                  goto put_file;
> > > > >    148
> > > > >    149          if (priv->mm_wr_seq == mmap_write_seq_read(priv->mm))
> > > > >    150                  return 0;
> > > > >    151
> > > > >    152          /* Address space got modified, vma might be stale. Wait and retry. */
> > > > >    153          rcu_read_unlock();
> > > > >    154          ret = mmap_read_lock_killable(priv->mm);
> > > > >    155          mmap_write_seq_record(priv->mm, &priv->mm_wr_seq);
> > > > >    156          mmap_read_unlock(priv->mm);
> > > > >    157          rcu_read_lock();
> > > > >    158
> > > > >    159          if (!ret)
> > > > >    160                  ret = -EAGAIN; /* no other errors, ok to retry */
> > > > >    161
> > > > >    162          anon_vma_name_put_if_valid(copy);
> > > > >    163  put_file:
> > > > >    164          if (copy->vm_file)
> > > > >    165                  fput(copy->vm_file);
> > > > >    166  out:
> > > > >    167          return ret;
> > > > >    168  }
> > > > >    169
> > > > >
> > > > > --
> > > > > 0-DAY CI Kernel Test Service
> > > > > https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-next:master 1589/1892] fs/proc/task_mmu.c:143:45: sparse: sparse: incorrect type in argument 1 (different address spaces)
  2024-01-26  2:24         ` Suren Baghdasaryan
@ 2024-01-26 10:22           ` Paul E. McKenney
  2024-01-26 16:55             ` Suren Baghdasaryan
  0 siblings, 1 reply; 11+ messages in thread
From: Paul E. McKenney @ 2024-01-26 10:22 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: kernel test robot, Matthew Wilcox, oe-kbuild-all,
	Linux Memory Management List, Andrew Morton

On Thu, Jan 25, 2024 at 06:24:05PM -0800, Suren Baghdasaryan wrote:
> On Thu, Jan 25, 2024 at 4:35 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Thu, Jan 25, 2024 at 03:17:17PM -0800, Suren Baghdasaryan wrote:
> > > On Thu, Jan 25, 2024 at 2:44 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > >
> > > > On Thu, Jan 25, 2024 at 01:27:34PM -0800, Suren Baghdasaryan wrote:
> > > > > On Thu, Jan 25, 2024 at 2:23 AM kernel test robot <lkp@intel.com> wrote:
> > > > > >
> > > > > > tree:   https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> > > > > > head:   01af33cc9894b4489fb68fa35c40e9fe85df63dc
> > > > > > commit: 0c30c4cd953025979b7689e49844837f762303ec [1589/1892] mm/maps: read proc/pid/maps under RCU
> > > > > > config: x86_64-randconfig-121-20240125 (https://download.01.org/0day-ci/archive/20240125/202401251829.0m6Eo4LI-lkp@intel.com/config)
> > > > > > compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
> > > > > > reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240125/202401251829.0m6Eo4LI-lkp@intel.com/reproduce)
> > > > > >
> > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > > > > the same patch/commit), kindly add following tags
> > > > > > | Reported-by: kernel test robot <lkp@intel.com>
> > > > > > | Closes: https://lore.kernel.org/oe-kbuild-all/202401251829.0m6Eo4LI-lkp@intel.com/
> > > > > >
> > > > > > sparse warnings: (new ones prefixed by >>)
> > > > > > >> fs/proc/task_mmu.c:143:45: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct file [noderef] __rcu **f @@     got struct file ** @@
> > > > >
> > > > > Uh, this is a problem.
> > > > > I missed that get_file_rcu() is used only with mm->exe_file and
> > > > > vma->vm_file is not really RCU-safe. It's freed via a call to fput()
> > > > > which schedules its freeing using schedule_delayed_work(..., 1) but I
> > > > > don't think that constitutes RCU grace period. Paul, Matthew, could
> > > > > you please confirm? In the meantime I'm going to ask Andrew to remove
> > > > > my patchset from mm-unstable to be safe.
> > > >
> > > > Sadly, no, schedule_delayed_work() does not imply an RCU grace period.
> > > >
> > > > There is a queue_rcu_work() that schedules work after a grace period,
> > > > which could be combined with a timer to get the delay.
> > > >
> > > > Another approach would be to use get_state_synchronize_rcu() before
> > > > the schedule_delayed_work() in fput(), then do cond_synchronize_rcu()
> > > > in delayed_fput().  This would require adding an unsigned long to
> > > > struct file to keep track of which grace period a given struct file
> > > > needed to wait for.
> > > >
> > > > Perhaps something like this:
> > > >
> > > > ------------------------------------------------------------------------
> > > >
> > > > void fput(struct file *file)
> > > > {
> > > >         if (atomic_long_dec_and_test(&file->f_count)) {
> > > >                 struct task_struct *task = current;
> > > >
> > > >                 if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) {
> > > >                         init_task_work(&file->f_rcuhead, ____fput);
> > > >                         if (!task_work_add(task, &file->f_rcuhead, TWA_RESUME))
> > > >                                 return;
> > > >                         /*
> > > >                          * After this task has run exit_task_work(),
> > > >                          * task_work_add() will fail.  Fall through to delayed
> > > >                          * fput to avoid leaking *file.
> > > >                          */
> > > >                 }
> > > >
> > > >                 file->f_rcu_seq = get_state_synchronize_rcu();
> > > >                 if (llist_add(&file->f_llist, &delayed_fput_list))
> > > >                         schedule_delayed_work(&delayed_fput_work, 1);
> > > >         }
> > > > }
> > > >
> > > > ------------------------------------------------------------------------
> > > >
> > > > And this:
> > > >
> > > > ------------------------------------------------------------------------
> > > >
> > > > static void delayed_fput(struct work_struct *unused)
> > > > {
> > > >         struct llist_node *node = llist_del_all(&delayed_fput_list);
> > > >         struct file *f, *t;
> > > >
> > > >         llist_for_each_entry_safe(f, t, node, f_llist) {
> > > >                 cond_synchronize_rcu(f->f_rcu_seq);
> > > >                 __fput(f);
> > > >         }
> > > > }
> > > >
> > > > ------------------------------------------------------------------------
> > > >
> > > > Note that if you called fput() on a long sequence of struct file
> > > > structures, the cond_synchronize_rcu() would be a near-noop almost all the
> > > > time, actually blocking at most about every once per every few jiffies.
> > > > After all, once a grace period has been waited for, it covers all of
> > > > the struct file structures that were passed to fput() during a given
> > > > RCU grace period.
> > > >
> > > > Still, it would add the occasional delay.  And it would increase the
> > > > size of struct file, though there are workarounds for that, if size
> > > > is an issue.
> > > >
> > > > Thoughts?
> > >
> > > Thanks for the suggestion, Paul. I'm worried about this occasional
> > > delay but otherwise this seems like a nice and simple approach.
> >
> > One potential saving grace is that the more heavily loaded the mechanism,
> > the smaller a fraction of the cond_synchronize_rcu() calls will do
> > a delay.
> >
> > >                                                                 Do you
> > > guys think that making *all* files RCU-safe with this approach is
> > > warranted? For my particular case I need only vma->vm_file to be
> > > RCU-safe but maybe there are other cases which would benefit from
> > > this?
> >
> > To this, I can only give an unqualified "I don't know".  :-(
> >
> > But if there is some condition that can be sampled on a per-file-structure
> > basis, you could use that to invoke cond_synchronize_rcu() only when
> > needed.  Or send only those file structures that need the extra delay
> > through queue_rcu_work(), perhaps by accumulating a list of them.
> 
> Thanks Paul! You gave me enough food for thought. Let me see if I can
> come up with something usable.

Do we have the same problem with the task_work_add() path in fput()?

							Thanx, Paul

> Cheers,
> Suren.
> 
> >
> >                                                         Thanx, Paul
> >
> > > > > >    fs/proc/task_mmu.c:143:45: sparse:     expected struct file [noderef] __rcu **f
> > > > > >    fs/proc/task_mmu.c:143:45: sparse:     got struct file **
> > > > > >    fs/proc/task_mmu.c: note: in included file (through include/linux/rbtree.h, include/linux/mm_types.h, include/linux/mmzone.h, ...):
> > > > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'get_vma_snapshot' - unexpected unlock
> > > > > >    fs/proc/task_mmu.c:264:22: sparse: sparse: context imbalance in 'm_start' - different lock contexts for basic block
> > > > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'm_stop' - unexpected unlock
> > > > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'smaps_pte_range' - unexpected unlock
> > > > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'clear_refs_pte_range' - unexpected unlock
> > > > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'pagemap_pmd_range' - unexpected unlock
> > > > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'pagemap_scan_pmd_entry' - unexpected unlock
> > > > > >    fs/proc/task_mmu.c: note: in included file (through arch/x86/include/asm/uaccess.h, include/linux/uaccess.h, include/linux/sched/task.h, ...):
> > > > > >    arch/x86/include/asm/uaccess_64.h:88:24: sparse: sparse: cast removes address space '__user' of expression
> > > > > >    arch/x86/include/asm/uaccess_64.h:88:24: sparse: sparse: cast removes address space '__user' of expression
> > > > > >
> > > > > > vim +143 fs/proc/task_mmu.c
> > > > > >
> > > > > >    132
> > > > > >    133  /*
> > > > > >    134   * Take VMA snapshot and pin vm_file and anon_name as they are used by
> > > > > >    135   * show_map_vma.
> > > > > >    136   */
> > > > > >    137  static int get_vma_snapshot(struct proc_maps_private *priv, struct vm_area_struct *vma)
> > > > > >    138  {
> > > > > >    139          struct vm_area_struct *copy = &priv->vma_copy;
> > > > > >    140          int ret = -EAGAIN;
> > > > > >    141
> > > > > >    142          memcpy(copy, vma, sizeof(*vma));
> > > > > >  > 143          if (copy->vm_file && !get_file_rcu(&copy->vm_file))
> > > > > >    144                  goto out;
> > > > > >    145
> > > > > >    146          if (!anon_vma_name_get_if_valid(copy))
> > > > > >    147                  goto put_file;
> > > > > >    148
> > > > > >    149          if (priv->mm_wr_seq == mmap_write_seq_read(priv->mm))
> > > > > >    150                  return 0;
> > > > > >    151
> > > > > >    152          /* Address space got modified, vma might be stale. Wait and retry. */
> > > > > >    153          rcu_read_unlock();
> > > > > >    154          ret = mmap_read_lock_killable(priv->mm);
> > > > > >    155          mmap_write_seq_record(priv->mm, &priv->mm_wr_seq);
> > > > > >    156          mmap_read_unlock(priv->mm);
> > > > > >    157          rcu_read_lock();
> > > > > >    158
> > > > > >    159          if (!ret)
> > > > > >    160                  ret = -EAGAIN; /* no other errors, ok to retry */
> > > > > >    161
> > > > > >    162          anon_vma_name_put_if_valid(copy);
> > > > > >    163  put_file:
> > > > > >    164          if (copy->vm_file)
> > > > > >    165                  fput(copy->vm_file);
> > > > > >    166  out:
> > > > > >    167          return ret;
> > > > > >    168  }
> > > > > >    169
> > > > > >
> > > > > > --
> > > > > > 0-DAY CI Kernel Test Service
> > > > > > https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-next:master 1589/1892] fs/proc/task_mmu.c:143:45: sparse: sparse: incorrect type in argument 1 (different address spaces)
  2024-01-26 10:22           ` Paul E. McKenney
@ 2024-01-26 16:55             ` Suren Baghdasaryan
  2024-01-26 19:27               ` Paul E. McKenney
  0 siblings, 1 reply; 11+ messages in thread
From: Suren Baghdasaryan @ 2024-01-26 16:55 UTC (permalink / raw)
  To: paulmck
  Cc: kernel test robot, Matthew Wilcox, oe-kbuild-all,
	Linux Memory Management List, Andrew Morton

On Fri, Jan 26, 2024 at 2:22 AM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Thu, Jan 25, 2024 at 06:24:05PM -0800, Suren Baghdasaryan wrote:
> > On Thu, Jan 25, 2024 at 4:35 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > >
> > > On Thu, Jan 25, 2024 at 03:17:17PM -0800, Suren Baghdasaryan wrote:
> > > > On Thu, Jan 25, 2024 at 2:44 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > >
> > > > > On Thu, Jan 25, 2024 at 01:27:34PM -0800, Suren Baghdasaryan wrote:
> > > > > > On Thu, Jan 25, 2024 at 2:23 AM kernel test robot <lkp@intel.com> wrote:
> > > > > > >
> > > > > > > tree:   https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> > > > > > > head:   01af33cc9894b4489fb68fa35c40e9fe85df63dc
> > > > > > > commit: 0c30c4cd953025979b7689e49844837f762303ec [1589/1892] mm/maps: read proc/pid/maps under RCU
> > > > > > > config: x86_64-randconfig-121-20240125 (https://download.01.org/0day-ci/archive/20240125/202401251829.0m6Eo4LI-lkp@intel.com/config)
> > > > > > > compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
> > > > > > > reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240125/202401251829.0m6Eo4LI-lkp@intel.com/reproduce)
> > > > > > >
> > > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > > > > > the same patch/commit), kindly add following tags
> > > > > > > | Reported-by: kernel test robot <lkp@intel.com>
> > > > > > > | Closes: https://lore.kernel.org/oe-kbuild-all/202401251829.0m6Eo4LI-lkp@intel.com/
> > > > > > >
> > > > > > > sparse warnings: (new ones prefixed by >>)
> > > > > > > >> fs/proc/task_mmu.c:143:45: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct file [noderef] __rcu **f @@     got struct file ** @@
> > > > > >
> > > > > > Uh, this is a problem.
> > > > > > I missed that get_file_rcu() is used only with mm->exe_file and
> > > > > > vma->vm_file is not really RCU-safe. It's freed via a call to fput()
> > > > > > which schedules its freeing using schedule_delayed_work(..., 1) but I
> > > > > > don't think that constitutes RCU grace period. Paul, Matthew, could
> > > > > > you please confirm? In the meantime I'm going to ask Andrew to remove
> > > > > > my patchset from mm-unstable to be safe.
> > > > >
> > > > > Sadly, no, schedule_delayed_work() does not imply an RCU grace period.
> > > > >
> > > > > There is a queue_rcu_work() that schedules work after a grace period,
> > > > > which could be combined with a timer to get the delay.
> > > > >
> > > > > Another approach would be to use get_state_synchronize_rcu() before
> > > > > the schedule_delayed_work() in fput(), then do cond_synchronize_rcu()
> > > > > in delayed_fput().  This would require adding an unsigned long to
> > > > > struct file to keep track of which grace period a given struct file
> > > > > needed to wait for.
> > > > >
> > > > > Perhaps something like this:
> > > > >
> > > > > ------------------------------------------------------------------------
> > > > >
> > > > > void fput(struct file *file)
> > > > > {
> > > > >         if (atomic_long_dec_and_test(&file->f_count)) {
> > > > >                 struct task_struct *task = current;
> > > > >
> > > > >                 if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) {
> > > > >                         init_task_work(&file->f_rcuhead, ____fput);
> > > > >                         if (!task_work_add(task, &file->f_rcuhead, TWA_RESUME))
> > > > >                                 return;
> > > > >                         /*
> > > > >                          * After this task has run exit_task_work(),
> > > > >                          * task_work_add() will fail.  Fall through to delayed
> > > > >                          * fput to avoid leaking *file.
> > > > >                          */
> > > > >                 }
> > > > >
> > > > >                 file->f_rcu_seq = get_state_synchronize_rcu();
> > > > >                 if (llist_add(&file->f_llist, &delayed_fput_list))
> > > > >                         schedule_delayed_work(&delayed_fput_work, 1);
> > > > >         }
> > > > > }
> > > > >
> > > > > ------------------------------------------------------------------------
> > > > >
> > > > > And this:
> > > > >
> > > > > ------------------------------------------------------------------------
> > > > >
> > > > > static void delayed_fput(struct work_struct *unused)
> > > > > {
> > > > >         struct llist_node *node = llist_del_all(&delayed_fput_list);
> > > > >         struct file *f, *t;
> > > > >
> > > > >         llist_for_each_entry_safe(f, t, node, f_llist) {
> > > > >                 cond_synchronize_rcu(f->f_rcu_seq);
> > > > >                 __fput(f);
> > > > >         }
> > > > > }
> > > > >
> > > > > ------------------------------------------------------------------------
> > > > >
> > > > > Note that if you called fput() on a long sequence of struct file
> > > > > structures, the cond_synchronize_rcu() would be a near-noop almost all the
> > > > > time, actually blocking at most about every once per every few jiffies.
> > > > > After all, once a grace period has been waited for, it covers all of
> > > > > the struct file structures that were passed to fput() during a given
> > > > > RCU grace period.
> > > > >
> > > > > Still, it would add the occasional delay.  And it would increase the
> > > > > size of struct file, though there are workarounds for that, if size
> > > > > is an issue.
> > > > >
> > > > > Thoughts?
> > > >
> > > > Thanks for the suggestion, Paul. I'm worried about this occasional
> > > > delay but otherwise this seems like a nice and simple approach.
> > >
> > > One potential saving grace is that the more heavily loaded the mechanism,
> > > the smaller a fraction of the cond_synchronize_rcu() calls will do
> > > a delay.
> > >
> > > >                                                                 Do you
> > > > guys think that making *all* files RCU-safe with this approach is
> > > > warranted? For my particular case I need only vma->vm_file to be
> > > > RCU-safe but maybe there are other cases which would benefit from
> > > > this?
> > >
> > > To this, I can only give an unqualified "I don't know".  :-(
> > >
> > > But if there is some condition that can be sampled on a per-file-structure
> > > basis, you could use that to invoke cond_synchronize_rcu() only when
> > > needed.  Or send only those file structures that need the extra delay
> > > through queue_rcu_work(), perhaps by accumulating a list of them.
> >
> > Thanks Paul! You gave me enough food for thought. Let me see if I can
> > come up with something usable.
>
> Do we have the same problem with the task_work_add() path in fput()?

Yes, I think so. It's called with TWA_RESUME, so the file will be
freed when the task returns to user mode, but that again does not
guarantee that RCU grace period is over.

>
>                                                         Thanx, Paul
>
> > Cheers,
> > Suren.
> >
> > >
> > >                                                         Thanx, Paul
> > >
> > > > > > >    fs/proc/task_mmu.c:143:45: sparse:     expected struct file [noderef] __rcu **f
> > > > > > >    fs/proc/task_mmu.c:143:45: sparse:     got struct file **
> > > > > > >    fs/proc/task_mmu.c: note: in included file (through include/linux/rbtree.h, include/linux/mm_types.h, include/linux/mmzone.h, ...):
> > > > > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'get_vma_snapshot' - unexpected unlock
> > > > > > >    fs/proc/task_mmu.c:264:22: sparse: sparse: context imbalance in 'm_start' - different lock contexts for basic block
> > > > > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'm_stop' - unexpected unlock
> > > > > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'smaps_pte_range' - unexpected unlock
> > > > > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'clear_refs_pte_range' - unexpected unlock
> > > > > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'pagemap_pmd_range' - unexpected unlock
> > > > > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'pagemap_scan_pmd_entry' - unexpected unlock
> > > > > > >    fs/proc/task_mmu.c: note: in included file (through arch/x86/include/asm/uaccess.h, include/linux/uaccess.h, include/linux/sched/task.h, ...):
> > > > > > >    arch/x86/include/asm/uaccess_64.h:88:24: sparse: sparse: cast removes address space '__user' of expression
> > > > > > >    arch/x86/include/asm/uaccess_64.h:88:24: sparse: sparse: cast removes address space '__user' of expression
> > > > > > >
> > > > > > > vim +143 fs/proc/task_mmu.c
> > > > > > >
> > > > > > >    132
> > > > > > >    133  /*
> > > > > > >    134   * Take VMA snapshot and pin vm_file and anon_name as they are used by
> > > > > > >    135   * show_map_vma.
> > > > > > >    136   */
> > > > > > >    137  static int get_vma_snapshot(struct proc_maps_private *priv, struct vm_area_struct *vma)
> > > > > > >    138  {
> > > > > > >    139          struct vm_area_struct *copy = &priv->vma_copy;
> > > > > > >    140          int ret = -EAGAIN;
> > > > > > >    141
> > > > > > >    142          memcpy(copy, vma, sizeof(*vma));
> > > > > > >  > 143          if (copy->vm_file && !get_file_rcu(&copy->vm_file))
> > > > > > >    144                  goto out;
> > > > > > >    145
> > > > > > >    146          if (!anon_vma_name_get_if_valid(copy))
> > > > > > >    147                  goto put_file;
> > > > > > >    148
> > > > > > >    149          if (priv->mm_wr_seq == mmap_write_seq_read(priv->mm))
> > > > > > >    150                  return 0;
> > > > > > >    151
> > > > > > >    152          /* Address space got modified, vma might be stale. Wait and retry. */
> > > > > > >    153          rcu_read_unlock();
> > > > > > >    154          ret = mmap_read_lock_killable(priv->mm);
> > > > > > >    155          mmap_write_seq_record(priv->mm, &priv->mm_wr_seq);
> > > > > > >    156          mmap_read_unlock(priv->mm);
> > > > > > >    157          rcu_read_lock();
> > > > > > >    158
> > > > > > >    159          if (!ret)
> > > > > > >    160                  ret = -EAGAIN; /* no other errors, ok to retry */
> > > > > > >    161
> > > > > > >    162          anon_vma_name_put_if_valid(copy);
> > > > > > >    163  put_file:
> > > > > > >    164          if (copy->vm_file)
> > > > > > >    165                  fput(copy->vm_file);
> > > > > > >    166  out:
> > > > > > >    167          return ret;
> > > > > > >    168  }
> > > > > > >    169
> > > > > > >
> > > > > > > --
> > > > > > > 0-DAY CI Kernel Test Service
> > > > > > > https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-next:master 1589/1892] fs/proc/task_mmu.c:143:45: sparse: sparse: incorrect type in argument 1 (different address spaces)
  2024-01-26 16:55             ` Suren Baghdasaryan
@ 2024-01-26 19:27               ` Paul E. McKenney
  2024-01-27 18:02                 ` Suren Baghdasaryan
  0 siblings, 1 reply; 11+ messages in thread
From: Paul E. McKenney @ 2024-01-26 19:27 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: kernel test robot, Matthew Wilcox, oe-kbuild-all,
	Linux Memory Management List, Andrew Morton

On Fri, Jan 26, 2024 at 08:55:38AM -0800, Suren Baghdasaryan wrote:
> On Fri, Jan 26, 2024 at 2:22 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Thu, Jan 25, 2024 at 06:24:05PM -0800, Suren Baghdasaryan wrote:
> > > On Thu, Jan 25, 2024 at 4:35 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > >
> > > > On Thu, Jan 25, 2024 at 03:17:17PM -0800, Suren Baghdasaryan wrote:
> > > > > On Thu, Jan 25, 2024 at 2:44 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > >
> > > > > > On Thu, Jan 25, 2024 at 01:27:34PM -0800, Suren Baghdasaryan wrote:
> > > > > > > On Thu, Jan 25, 2024 at 2:23 AM kernel test robot <lkp@intel.com> wrote:
> > > > > > > >
> > > > > > > > tree:   https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> > > > > > > > head:   01af33cc9894b4489fb68fa35c40e9fe85df63dc
> > > > > > > > commit: 0c30c4cd953025979b7689e49844837f762303ec [1589/1892] mm/maps: read proc/pid/maps under RCU
> > > > > > > > config: x86_64-randconfig-121-20240125 (https://download.01.org/0day-ci/archive/20240125/202401251829.0m6Eo4LI-lkp@intel.com/config)
> > > > > > > > compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
> > > > > > > > reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240125/202401251829.0m6Eo4LI-lkp@intel.com/reproduce)
> > > > > > > >
> > > > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > > > > > > the same patch/commit), kindly add following tags
> > > > > > > > | Reported-by: kernel test robot <lkp@intel.com>
> > > > > > > > | Closes: https://lore.kernel.org/oe-kbuild-all/202401251829.0m6Eo4LI-lkp@intel.com/
> > > > > > > >
> > > > > > > > sparse warnings: (new ones prefixed by >>)
> > > > > > > > >> fs/proc/task_mmu.c:143:45: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct file [noderef] __rcu **f @@     got struct file ** @@
> > > > > > >
> > > > > > > Uh, this is a problem.
> > > > > > > I missed that get_file_rcu() is used only with mm->exe_file and
> > > > > > > vma->vm_file is not really RCU-safe. It's freed via a call to fput()
> > > > > > > which schedules its freeing using schedule_delayed_work(..., 1) but I
> > > > > > > don't think that constitutes RCU grace period. Paul, Matthew, could
> > > > > > > you please confirm? In the meantime I'm going to ask Andrew to remove
> > > > > > > my patchset from mm-unstable to be safe.
> > > > > >
> > > > > > Sadly, no, schedule_delayed_work() does not imply an RCU grace period.
> > > > > >
> > > > > > There is a queue_rcu_work() that schedules work after a grace period,
> > > > > > which could be combined with a timer to get the delay.
> > > > > >
> > > > > > Another approach would be to use get_state_synchronize_rcu() before
> > > > > > the schedule_delayed_work() in fput(), then do cond_synchronize_rcu()
> > > > > > in delayed_fput().  This would require adding an unsigned long to
> > > > > > struct file to keep track of which grace period a given struct file
> > > > > > needed to wait for.
> > > > > >
> > > > > > Perhaps something like this:
> > > > > >
> > > > > > ------------------------------------------------------------------------
> > > > > >
> > > > > > void fput(struct file *file)
> > > > > > {
> > > > > >         if (atomic_long_dec_and_test(&file->f_count)) {
> > > > > >                 struct task_struct *task = current;
> > > > > >
> > > > > >                 if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) {
> > > > > >                         init_task_work(&file->f_rcuhead, ____fput);
> > > > > >                         if (!task_work_add(task, &file->f_rcuhead, TWA_RESUME))
> > > > > >                                 return;
> > > > > >                         /*
> > > > > >                          * After this task has run exit_task_work(),
> > > > > >                          * task_work_add() will fail.  Fall through to delayed
> > > > > >                          * fput to avoid leaking *file.
> > > > > >                          */
> > > > > >                 }
> > > > > >
> > > > > >                 file->f_rcu_seq = get_state_synchronize_rcu();
> > > > > >                 if (llist_add(&file->f_llist, &delayed_fput_list))
> > > > > >                         schedule_delayed_work(&delayed_fput_work, 1);
> > > > > >         }
> > > > > > }
> > > > > >
> > > > > > ------------------------------------------------------------------------
> > > > > >
> > > > > > And this:
> > > > > >
> > > > > > ------------------------------------------------------------------------
> > > > > >
> > > > > > static void delayed_fput(struct work_struct *unused)
> > > > > > {
> > > > > >         struct llist_node *node = llist_del_all(&delayed_fput_list);
> > > > > >         struct file *f, *t;
> > > > > >
> > > > > >         llist_for_each_entry_safe(f, t, node, f_llist) {
> > > > > >                 cond_synchronize_rcu(f->f_rcu_seq);
> > > > > >                 __fput(f);
> > > > > >         }
> > > > > > }
> > > > > >
> > > > > > ------------------------------------------------------------------------
> > > > > >
> > > > > > Note that if you called fput() on a long sequence of struct file
> > > > > > structures, the cond_synchronize_rcu() would be a near-noop almost all the
> > > > > > time, actually blocking at most about every once per every few jiffies.
> > > > > > After all, once a grace period has been waited for, it covers all of
> > > > > > the struct file structures that were passed to fput() during a given
> > > > > > RCU grace period.
> > > > > >
> > > > > > Still, it would add the occasional delay.  And it would increase the
> > > > > > size of struct file, though there are workarounds for that, if size
> > > > > > is an issue.
> > > > > >
> > > > > > Thoughts?
> > > > >
> > > > > Thanks for the suggestion, Paul. I'm worried about this occasional
> > > > > delay but otherwise this seems like a nice and simple approach.
> > > >
> > > > One potential saving grace is that the more heavily loaded the mechanism,
> > > > the smaller a fraction of the cond_synchronize_rcu() calls will do
> > > > a delay.
> > > >
> > > > >                                                                 Do you
> > > > > guys think that making *all* files RCU-safe with this approach is
> > > > > warranted? For my particular case I need only vma->vm_file to be
> > > > > RCU-safe but maybe there are other cases which would benefit from
> > > > > this?
> > > >
> > > > To this, I can only give an unqualified "I don't know".  :-(
> > > >
> > > > But if there is some condition that can be sampled on a per-file-structure
> > > > basis, you could use that to invoke cond_synchronize_rcu() only when
> > > > needed.  Or send only those file structures that need the extra delay
> > > > through queue_rcu_work(), perhaps by accumulating a list of them.
> > >
> > > Thanks Paul! You gave me enough food for thought. Let me see if I can
> > > come up with something usable.
> >
> > Do we have the same problem with the task_work_add() path in fput()?
> 
> Yes, I think so. It's called with TWA_RESUME, so the file will be
> freed when the task returns to user mode, but that again does not
> guarantee that RCU grace period is over.

That one is going to be annoying, as there doesn't seem to be a nice place
to hide grace-period latency.  The fget() function and friends want an
FD, so I am not seeing an obvious way of grabbing an explicit reference.

Now the fget() code path is protected by RCU.  But maybe that does not
apply in the case where the file is closed but still mapped?

                                                        Thanx, Paul

> > > Cheers,
> > > Suren.
> > >
> > > >
> > > >                                                         Thanx, Paul
> > > >
> > > > > > > >    fs/proc/task_mmu.c:143:45: sparse:     expected struct file [noderef] __rcu **f
> > > > > > > >    fs/proc/task_mmu.c:143:45: sparse:     got struct file **
> > > > > > > >    fs/proc/task_mmu.c: note: in included file (through include/linux/rbtree.h, include/linux/mm_types.h, include/linux/mmzone.h, ...):
> > > > > > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'get_vma_snapshot' - unexpected unlock
> > > > > > > >    fs/proc/task_mmu.c:264:22: sparse: sparse: context imbalance in 'm_start' - different lock contexts for basic block
> > > > > > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'm_stop' - unexpected unlock
> > > > > > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'smaps_pte_range' - unexpected unlock
> > > > > > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'clear_refs_pte_range' - unexpected unlock
> > > > > > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'pagemap_pmd_range' - unexpected unlock
> > > > > > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'pagemap_scan_pmd_entry' - unexpected unlock
> > > > > > > >    fs/proc/task_mmu.c: note: in included file (through arch/x86/include/asm/uaccess.h, include/linux/uaccess.h, include/linux/sched/task.h, ...):
> > > > > > > >    arch/x86/include/asm/uaccess_64.h:88:24: sparse: sparse: cast removes address space '__user' of expression
> > > > > > > >    arch/x86/include/asm/uaccess_64.h:88:24: sparse: sparse: cast removes address space '__user' of expression
> > > > > > > >
> > > > > > > > vim +143 fs/proc/task_mmu.c
> > > > > > > >
> > > > > > > >    132
> > > > > > > >    133  /*
> > > > > > > >    134   * Take VMA snapshot and pin vm_file and anon_name as they are used by
> > > > > > > >    135   * show_map_vma.
> > > > > > > >    136   */
> > > > > > > >    137  static int get_vma_snapshot(struct proc_maps_private *priv, struct vm_area_struct *vma)
> > > > > > > >    138  {
> > > > > > > >    139          struct vm_area_struct *copy = &priv->vma_copy;
> > > > > > > >    140          int ret = -EAGAIN;
> > > > > > > >    141
> > > > > > > >    142          memcpy(copy, vma, sizeof(*vma));
> > > > > > > >  > 143          if (copy->vm_file && !get_file_rcu(&copy->vm_file))
> > > > > > > >    144                  goto out;
> > > > > > > >    145
> > > > > > > >    146          if (!anon_vma_name_get_if_valid(copy))
> > > > > > > >    147                  goto put_file;
> > > > > > > >    148
> > > > > > > >    149          if (priv->mm_wr_seq == mmap_write_seq_read(priv->mm))
> > > > > > > >    150                  return 0;
> > > > > > > >    151
> > > > > > > >    152          /* Address space got modified, vma might be stale. Wait and retry. */
> > > > > > > >    153          rcu_read_unlock();
> > > > > > > >    154          ret = mmap_read_lock_killable(priv->mm);
> > > > > > > >    155          mmap_write_seq_record(priv->mm, &priv->mm_wr_seq);
> > > > > > > >    156          mmap_read_unlock(priv->mm);
> > > > > > > >    157          rcu_read_lock();
> > > > > > > >    158
> > > > > > > >    159          if (!ret)
> > > > > > > >    160                  ret = -EAGAIN; /* no other errors, ok to retry */
> > > > > > > >    161
> > > > > > > >    162          anon_vma_name_put_if_valid(copy);
> > > > > > > >    163  put_file:
> > > > > > > >    164          if (copy->vm_file)
> > > > > > > >    165                  fput(copy->vm_file);
> > > > > > > >    166  out:
> > > > > > > >    167          return ret;
> > > > > > > >    168  }
> > > > > > > >    169
> > > > > > > >
> > > > > > > > --
> > > > > > > > 0-DAY CI Kernel Test Service
> > > > > > > > https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-next:master 1589/1892] fs/proc/task_mmu.c:143:45: sparse: sparse: incorrect type in argument 1 (different address spaces)
  2024-01-26 19:27               ` Paul E. McKenney
@ 2024-01-27 18:02                 ` Suren Baghdasaryan
  2024-01-27 19:17                   ` Paul E. McKenney
  0 siblings, 1 reply; 11+ messages in thread
From: Suren Baghdasaryan @ 2024-01-27 18:02 UTC (permalink / raw)
  To: paulmck
  Cc: kernel test robot, Matthew Wilcox, oe-kbuild-all,
	Linux Memory Management List, Andrew Morton

On Fri, Jan 26, 2024 at 11:27 AM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Fri, Jan 26, 2024 at 08:55:38AM -0800, Suren Baghdasaryan wrote:
> > On Fri, Jan 26, 2024 at 2:22 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > >
> > > On Thu, Jan 25, 2024 at 06:24:05PM -0800, Suren Baghdasaryan wrote:
> > > > On Thu, Jan 25, 2024 at 4:35 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > >
> > > > > On Thu, Jan 25, 2024 at 03:17:17PM -0800, Suren Baghdasaryan wrote:
> > > > > > On Thu, Jan 25, 2024 at 2:44 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > > >
> > > > > > > On Thu, Jan 25, 2024 at 01:27:34PM -0800, Suren Baghdasaryan wrote:
> > > > > > > > On Thu, Jan 25, 2024 at 2:23 AM kernel test robot <lkp@intel.com> wrote:
> > > > > > > > >
> > > > > > > > > tree:   https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> > > > > > > > > head:   01af33cc9894b4489fb68fa35c40e9fe85df63dc
> > > > > > > > > commit: 0c30c4cd953025979b7689e49844837f762303ec [1589/1892] mm/maps: read proc/pid/maps under RCU
> > > > > > > > > config: x86_64-randconfig-121-20240125 (https://download.01.org/0day-ci/archive/20240125/202401251829.0m6Eo4LI-lkp@intel.com/config)
> > > > > > > > > compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
> > > > > > > > > reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240125/202401251829.0m6Eo4LI-lkp@intel.com/reproduce)
> > > > > > > > >
> > > > > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > > > > > > > the same patch/commit), kindly add following tags
> > > > > > > > > | Reported-by: kernel test robot <lkp@intel.com>
> > > > > > > > > | Closes: https://lore.kernel.org/oe-kbuild-all/202401251829.0m6Eo4LI-lkp@intel.com/
> > > > > > > > >
> > > > > > > > > sparse warnings: (new ones prefixed by >>)
> > > > > > > > > >> fs/proc/task_mmu.c:143:45: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct file [noderef] __rcu **f @@     got struct file ** @@
> > > > > > > >
> > > > > > > > Uh, this is a problem.
> > > > > > > > I missed that get_file_rcu() is used only with mm->exe_file and
> > > > > > > > vma->vm_file is not really RCU-safe. It's freed via a call to fput()
> > > > > > > > which schedules its freeing using schedule_delayed_work(..., 1) but I
> > > > > > > > don't think that constitutes RCU grace period. Paul, Matthew, could
> > > > > > > > you please confirm? In the meantime I'm going to ask Andrew to remove
> > > > > > > > my patchset from mm-unstable to be safe.
> > > > > > >
> > > > > > > Sadly, no, schedule_delayed_work() does not imply an RCU grace period.
> > > > > > >
> > > > > > > There is a queue_rcu_work() that schedules work after a grace period,
> > > > > > > which could be combined with a timer to get the delay.
> > > > > > >
> > > > > > > Another approach would be to use get_state_synchronize_rcu() before
> > > > > > > the schedule_delayed_work() in fput(), then do cond_synchronize_rcu()
> > > > > > > in delayed_fput().  This would require adding an unsigned long to
> > > > > > > struct file to keep track of which grace period a given struct file
> > > > > > > needed to wait for.
> > > > > > >
> > > > > > > Perhaps something like this:
> > > > > > >
> > > > > > > ------------------------------------------------------------------------
> > > > > > >
> > > > > > > void fput(struct file *file)
> > > > > > > {
> > > > > > >         if (atomic_long_dec_and_test(&file->f_count)) {
> > > > > > >                 struct task_struct *task = current;
> > > > > > >
> > > > > > >                 if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) {
> > > > > > >                         init_task_work(&file->f_rcuhead, ____fput);
> > > > > > >                         if (!task_work_add(task, &file->f_rcuhead, TWA_RESUME))
> > > > > > >                                 return;
> > > > > > >                         /*
> > > > > > >                          * After this task has run exit_task_work(),
> > > > > > >                          * task_work_add() will fail.  Fall through to delayed
> > > > > > >                          * fput to avoid leaking *file.
> > > > > > >                          */
> > > > > > >                 }
> > > > > > >
> > > > > > >                 file->f_rcu_seq = get_state_synchronize_rcu();
> > > > > > >                 if (llist_add(&file->f_llist, &delayed_fput_list))
> > > > > > >                         schedule_delayed_work(&delayed_fput_work, 1);
> > > > > > >         }
> > > > > > > }
> > > > > > >
> > > > > > > ------------------------------------------------------------------------
> > > > > > >
> > > > > > > And this:
> > > > > > >
> > > > > > > ------------------------------------------------------------------------
> > > > > > >
> > > > > > > static void delayed_fput(struct work_struct *unused)
> > > > > > > {
> > > > > > >         struct llist_node *node = llist_del_all(&delayed_fput_list);
> > > > > > >         struct file *f, *t;
> > > > > > >
> > > > > > >         llist_for_each_entry_safe(f, t, node, f_llist) {
> > > > > > >                 cond_synchronize_rcu(f->f_rcu_seq);
> > > > > > >                 __fput(f);
> > > > > > >         }
> > > > > > > }
> > > > > > >
> > > > > > > ------------------------------------------------------------------------
> > > > > > >
> > > > > > > Note that if you called fput() on a long sequence of struct file
> > > > > > > structures, the cond_synchronize_rcu() would be a near-noop almost all the
> > > > > > > time, actually blocking at most about every once per every few jiffies.
> > > > > > > After all, once a grace period has been waited for, it covers all of
> > > > > > > the struct file structures that were passed to fput() during a given
> > > > > > > RCU grace period.
> > > > > > >
> > > > > > > Still, it would add the occasional delay.  And it would increase the
> > > > > > > size of struct file, though there are workarounds for that, if size
> > > > > > > is an issue.
> > > > > > >
> > > > > > > Thoughts?
> > > > > >
> > > > > > Thanks for the suggestion, Paul. I'm worried about this occasional
> > > > > > delay but otherwise this seems like a nice and simple approach.
> > > > >
> > > > > One potential saving grace is that the more heavily loaded the mechanism,
> > > > > the smaller a fraction of the cond_synchronize_rcu() calls will do
> > > > > a delay.
> > > > >
> > > > > >                                                                 Do you
> > > > > > guys think that making *all* files RCU-safe with this approach is
> > > > > > warranted? For my particular case I need only vma->vm_file to be
> > > > > > RCU-safe but maybe there are other cases which would benefit from
> > > > > > this?
> > > > >
> > > > > To this, I can only give an unqualified "I don't know".  :-(
> > > > >
> > > > > But if there is some condition that can be sampled on a per-file-structure
> > > > > basis, you could use that to invoke cond_synchronize_rcu() only when
> > > > > needed.  Or send only those file structures that need the extra delay
> > > > > through queue_rcu_work(), perhaps by accumulating a list of them.
> > > >
> > > > Thanks Paul! You gave me enough food for thought. Let me see if I can
> > > > come up with something usable.
> > >
> > > Do we have the same problem with the task_work_add() path in fput()?
> >
> > Yes, I think so. It's called with TWA_RESUME, so the file will be
> > freed when the task returns to user mode, but that again does not
> > guarantee that RCU grace period is over.
>
> That one is going to be annoying, as there doesn't seem to be a nice place
> to hide grace-period latency.  The fget() function and friends want an
> FD, so I am not seeing an obvious way of grabbing an explicit reference.

Yeah, the more I'm thinking about this, the more I'm leaning towards
simply write-locking the VMA to stabilize it for the duration of
outputting its data. This would make the code much simpler, we don't
need to make any VMA members RCU-safe, we don't need to take a VMA
snapshot, we don't introduce any regressions in potentially more
important areas and it solves the problem we want to solve. The
problem is: /proc/pid/maps reader blocking a more important writer.
With the VMA locking approach, the only time this can happen is if the
writer tries to modify a VMA while we are printing its information.
And even then it will be blocked only for a short duration because
once we are done printing that VMA's info, we unlock it. I think it's
worth at least trying. WDYT?

>
> Now the fget() code path is protected by RCU.  But maybe that does not
> apply in the case where the file is closed but still mapped?
>
>                                                         Thanx, Paul
>
> > > > Cheers,
> > > > Suren.
> > > >
> > > > >
> > > > >                                                         Thanx, Paul
> > > > >
> > > > > > > > >    fs/proc/task_mmu.c:143:45: sparse:     expected struct file [noderef] __rcu **f
> > > > > > > > >    fs/proc/task_mmu.c:143:45: sparse:     got struct file **
> > > > > > > > >    fs/proc/task_mmu.c: note: in included file (through include/linux/rbtree.h, include/linux/mm_types.h, include/linux/mmzone.h, ...):
> > > > > > > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'get_vma_snapshot' - unexpected unlock
> > > > > > > > >    fs/proc/task_mmu.c:264:22: sparse: sparse: context imbalance in 'm_start' - different lock contexts for basic block
> > > > > > > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'm_stop' - unexpected unlock
> > > > > > > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'smaps_pte_range' - unexpected unlock
> > > > > > > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'clear_refs_pte_range' - unexpected unlock
> > > > > > > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'pagemap_pmd_range' - unexpected unlock
> > > > > > > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'pagemap_scan_pmd_entry' - unexpected unlock
> > > > > > > > >    fs/proc/task_mmu.c: note: in included file (through arch/x86/include/asm/uaccess.h, include/linux/uaccess.h, include/linux/sched/task.h, ...):
> > > > > > > > >    arch/x86/include/asm/uaccess_64.h:88:24: sparse: sparse: cast removes address space '__user' of expression
> > > > > > > > >    arch/x86/include/asm/uaccess_64.h:88:24: sparse: sparse: cast removes address space '__user' of expression
> > > > > > > > >
> > > > > > > > > vim +143 fs/proc/task_mmu.c
> > > > > > > > >
> > > > > > > > >    132
> > > > > > > > >    133  /*
> > > > > > > > >    134   * Take VMA snapshot and pin vm_file and anon_name as they are used by
> > > > > > > > >    135   * show_map_vma.
> > > > > > > > >    136   */
> > > > > > > > >    137  static int get_vma_snapshot(struct proc_maps_private *priv, struct vm_area_struct *vma)
> > > > > > > > >    138  {
> > > > > > > > >    139          struct vm_area_struct *copy = &priv->vma_copy;
> > > > > > > > >    140          int ret = -EAGAIN;
> > > > > > > > >    141
> > > > > > > > >    142          memcpy(copy, vma, sizeof(*vma));
> > > > > > > > >  > 143          if (copy->vm_file && !get_file_rcu(&copy->vm_file))
> > > > > > > > >    144                  goto out;
> > > > > > > > >    145
> > > > > > > > >    146          if (!anon_vma_name_get_if_valid(copy))
> > > > > > > > >    147                  goto put_file;
> > > > > > > > >    148
> > > > > > > > >    149          if (priv->mm_wr_seq == mmap_write_seq_read(priv->mm))
> > > > > > > > >    150                  return 0;
> > > > > > > > >    151
> > > > > > > > >    152          /* Address space got modified, vma might be stale. Wait and retry. */
> > > > > > > > >    153          rcu_read_unlock();
> > > > > > > > >    154          ret = mmap_read_lock_killable(priv->mm);
> > > > > > > > >    155          mmap_write_seq_record(priv->mm, &priv->mm_wr_seq);
> > > > > > > > >    156          mmap_read_unlock(priv->mm);
> > > > > > > > >    157          rcu_read_lock();
> > > > > > > > >    158
> > > > > > > > >    159          if (!ret)
> > > > > > > > >    160                  ret = -EAGAIN; /* no other errors, ok to retry */
> > > > > > > > >    161
> > > > > > > > >    162          anon_vma_name_put_if_valid(copy);
> > > > > > > > >    163  put_file:
> > > > > > > > >    164          if (copy->vm_file)
> > > > > > > > >    165                  fput(copy->vm_file);
> > > > > > > > >    166  out:
> > > > > > > > >    167          return ret;
> > > > > > > > >    168  }
> > > > > > > > >    169
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > 0-DAY CI Kernel Test Service
> > > > > > > > > https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-next:master 1589/1892] fs/proc/task_mmu.c:143:45: sparse: sparse: incorrect type in argument 1 (different address spaces)
  2024-01-27 18:02                 ` Suren Baghdasaryan
@ 2024-01-27 19:17                   ` Paul E. McKenney
  0 siblings, 0 replies; 11+ messages in thread
From: Paul E. McKenney @ 2024-01-27 19:17 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: kernel test robot, Matthew Wilcox, oe-kbuild-all,
	Linux Memory Management List, Andrew Morton

On Sat, Jan 27, 2024 at 10:02:29AM -0800, Suren Baghdasaryan wrote:
> On Fri, Jan 26, 2024 at 11:27 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Fri, Jan 26, 2024 at 08:55:38AM -0800, Suren Baghdasaryan wrote:
> > > On Fri, Jan 26, 2024 at 2:22 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > >
> > > > On Thu, Jan 25, 2024 at 06:24:05PM -0800, Suren Baghdasaryan wrote:
> > > > > On Thu, Jan 25, 2024 at 4:35 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > >
> > > > > > On Thu, Jan 25, 2024 at 03:17:17PM -0800, Suren Baghdasaryan wrote:
> > > > > > > On Thu, Jan 25, 2024 at 2:44 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > > > >
> > > > > > > > On Thu, Jan 25, 2024 at 01:27:34PM -0800, Suren Baghdasaryan wrote:
> > > > > > > > > On Thu, Jan 25, 2024 at 2:23 AM kernel test robot <lkp@intel.com> wrote:
> > > > > > > > > >
> > > > > > > > > > tree:   https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> > > > > > > > > > head:   01af33cc9894b4489fb68fa35c40e9fe85df63dc
> > > > > > > > > > commit: 0c30c4cd953025979b7689e49844837f762303ec [1589/1892] mm/maps: read proc/pid/maps under RCU
> > > > > > > > > > config: x86_64-randconfig-121-20240125 (https://download.01.org/0day-ci/archive/20240125/202401251829.0m6Eo4LI-lkp@intel.com/config)
> > > > > > > > > > compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
> > > > > > > > > > reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240125/202401251829.0m6Eo4LI-lkp@intel.com/reproduce)
> > > > > > > > > >
> > > > > > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > > > > > > > > the same patch/commit), kindly add following tags
> > > > > > > > > > | Reported-by: kernel test robot <lkp@intel.com>
> > > > > > > > > > | Closes: https://lore.kernel.org/oe-kbuild-all/202401251829.0m6Eo4LI-lkp@intel.com/
> > > > > > > > > >
> > > > > > > > > > sparse warnings: (new ones prefixed by >>)
> > > > > > > > > > >> fs/proc/task_mmu.c:143:45: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct file [noderef] __rcu **f @@     got struct file ** @@
> > > > > > > > >
> > > > > > > > > Uh, this is a problem.
> > > > > > > > > I missed that get_file_rcu() is used only with mm->exe_file and
> > > > > > > > > vma->vm_file is not really RCU-safe. It's freed via a call to fput()
> > > > > > > > > which schedules its freeing using schedule_delayed_work(..., 1) but I
> > > > > > > > > don't think that constitutes RCU grace period. Paul, Matthew, could
> > > > > > > > > you please confirm? In the meantime I'm going to ask Andrew to remove
> > > > > > > > > my patchset from mm-unstable to be safe.
> > > > > > > >
> > > > > > > > Sadly, no, schedule_delayed_work() does not imply an RCU grace period.
> > > > > > > >
> > > > > > > > There is a queue_rcu_work() that schedules work after a grace period,
> > > > > > > > which could be combined with a timer to get the delay.
> > > > > > > >
> > > > > > > > Another approach would be to use get_state_synchronize_rcu() before
> > > > > > > > the schedule_delayed_work() in fput(), then do cond_synchronize_rcu()
> > > > > > > > in delayed_fput().  This would require adding an unsigned long to
> > > > > > > > struct file to keep track of which grace period a given struct file
> > > > > > > > needed to wait for.
> > > > > > > >
> > > > > > > > Perhaps something like this:
> > > > > > > >
> > > > > > > > ------------------------------------------------------------------------
> > > > > > > >
> > > > > > > > void fput(struct file *file)
> > > > > > > > {
> > > > > > > >         if (atomic_long_dec_and_test(&file->f_count)) {
> > > > > > > >                 struct task_struct *task = current;
> > > > > > > >
> > > > > > > >                 if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) {
> > > > > > > >                         init_task_work(&file->f_rcuhead, ____fput);
> > > > > > > >                         if (!task_work_add(task, &file->f_rcuhead, TWA_RESUME))
> > > > > > > >                                 return;
> > > > > > > >                         /*
> > > > > > > >                          * After this task has run exit_task_work(),
> > > > > > > >                          * task_work_add() will fail.  Fall through to delayed
> > > > > > > >                          * fput to avoid leaking *file.
> > > > > > > >                          */
> > > > > > > >                 }
> > > > > > > >
> > > > > > > >                 file->f_rcu_seq = get_state_synchronize_rcu();
> > > > > > > >                 if (llist_add(&file->f_llist, &delayed_fput_list))
> > > > > > > >                         schedule_delayed_work(&delayed_fput_work, 1);
> > > > > > > >         }
> > > > > > > > }
> > > > > > > >
> > > > > > > > ------------------------------------------------------------------------
> > > > > > > >
> > > > > > > > And this:
> > > > > > > >
> > > > > > > > ------------------------------------------------------------------------
> > > > > > > >
> > > > > > > > static void delayed_fput(struct work_struct *unused)
> > > > > > > > {
> > > > > > > >         struct llist_node *node = llist_del_all(&delayed_fput_list);
> > > > > > > >         struct file *f, *t;
> > > > > > > >
> > > > > > > >         llist_for_each_entry_safe(f, t, node, f_llist) {
> > > > > > > >                 cond_synchronize_rcu(f->f_rcu_seq);
> > > > > > > >                 __fput(f);
> > > > > > > >         }
> > > > > > > > }
> > > > > > > >
> > > > > > > > ------------------------------------------------------------------------
> > > > > > > >
> > > > > > > > Note that if you called fput() on a long sequence of struct file
> > > > > > > > structures, the cond_synchronize_rcu() would be a near-noop almost all the
> > > > > > > > time, actually blocking at most about every once per every few jiffies.
> > > > > > > > After all, once a grace period has been waited for, it covers all of
> > > > > > > > the struct file structures that were passed to fput() during a given
> > > > > > > > RCU grace period.
> > > > > > > >
> > > > > > > > Still, it would add the occasional delay.  And it would increase the
> > > > > > > > size of struct file, though there are workarounds for that, if size
> > > > > > > > is an issue.
> > > > > > > >
> > > > > > > > Thoughts?
> > > > > > >
> > > > > > > Thanks for the suggestion, Paul. I'm worried about this occasional
> > > > > > > delay but otherwise this seems like a nice and simple approach.
> > > > > >
> > > > > > One potential saving grace is that the more heavily loaded the mechanism,
> > > > > > the smaller a fraction of the cond_synchronize_rcu() calls will do
> > > > > > a delay.
> > > > > >
> > > > > > >                                                                 Do you
> > > > > > > guys think that making *all* files RCU-safe with this approach is
> > > > > > > warranted? For my particular case I need only vma->vm_file to be
> > > > > > > RCU-safe but maybe there are other cases which would benefit from
> > > > > > > this?
> > > > > >
> > > > > > To this, I can only give an unqualified "I don't know".  :-(
> > > > > >
> > > > > > But if there is some condition that can be sampled on a per-file-structure
> > > > > > basis, you could use that to invoke cond_synchronize_rcu() only when
> > > > > > needed.  Or send only those file structures that need the extra delay
> > > > > > through queue_rcu_work(), perhaps by accumulating a list of them.
> > > > >
> > > > > Thanks Paul! You gave me enough food for thought. Let me see if I can
> > > > > come up with something usable.
> > > >
> > > > Do we have the same problem with the task_work_add() path in fput()?
> > >
> > > Yes, I think so. It's called with TWA_RESUME, so the file will be
> > > freed when the task returns to user mode, but that again does not
> > > guarantee that RCU grace period is over.
> >
> > That one is going to be annoying, as there doesn't seem to be a nice place
> > to hide grace-period latency.  The fget() function and friends want an
> > FD, so I am not seeing an obvious way of grabbing an explicit reference.
> 
> Yeah, the more I'm thinking about this, the more I'm leaning towards
> simply write-locking the VMA to stabilize it for the duration of
> outputting its data. This would make the code much simpler, we don't
> need to make any VMA members RCU-safe, we don't need to take a VMA
> snapshot, we don't introduce any regressions in potentially more
> important areas and it solves the problem we want to solve. The
> problem is: /proc/pid/maps reader blocking a more important writer.
> With the VMA locking approach, the only time this can happen is if the
> writer tries to modify a VMA while we are printing its information.
> And even then it will be blocked only for a short duration because
> once we are done printing that VMA's info, we unlock it. I think it's
> worth at least trying. WDYT?

The only other thing that comes immediately to mind is to cache the
needed struct file information in the VMA.  Except that this information
includes ->f_path.  I also looked into using a dentry or inode, both
of which are RCU-protected, but I don't see a reasonable way to get the
needed information.

So I cannot argue against doing this VMA by VMA, as that will at least
make the problem less frequent, especially for applications having
large numbers of VMAs.

							Thanx, Paul

> > Now the fget() code path is protected by RCU.  But maybe that does not
> > apply in the case where the file is closed but still mapped?
> >
> >                                                         Thanx, Paul
> >
> > > > > Cheers,
> > > > > Suren.
> > > > >
> > > > > >
> > > > > >                                                         Thanx, Paul
> > > > > >
> > > > > > > > > >    fs/proc/task_mmu.c:143:45: sparse:     expected struct file [noderef] __rcu **f
> > > > > > > > > >    fs/proc/task_mmu.c:143:45: sparse:     got struct file **
> > > > > > > > > >    fs/proc/task_mmu.c: note: in included file (through include/linux/rbtree.h, include/linux/mm_types.h, include/linux/mmzone.h, ...):
> > > > > > > > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'get_vma_snapshot' - unexpected unlock
> > > > > > > > > >    fs/proc/task_mmu.c:264:22: sparse: sparse: context imbalance in 'm_start' - different lock contexts for basic block
> > > > > > > > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'm_stop' - unexpected unlock
> > > > > > > > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'smaps_pte_range' - unexpected unlock
> > > > > > > > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'clear_refs_pte_range' - unexpected unlock
> > > > > > > > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'pagemap_pmd_range' - unexpected unlock
> > > > > > > > > >    include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'pagemap_scan_pmd_entry' - unexpected unlock
> > > > > > > > > >    fs/proc/task_mmu.c: note: in included file (through arch/x86/include/asm/uaccess.h, include/linux/uaccess.h, include/linux/sched/task.h, ...):
> > > > > > > > > >    arch/x86/include/asm/uaccess_64.h:88:24: sparse: sparse: cast removes address space '__user' of expression
> > > > > > > > > >    arch/x86/include/asm/uaccess_64.h:88:24: sparse: sparse: cast removes address space '__user' of expression
> > > > > > > > > >
> > > > > > > > > > vim +143 fs/proc/task_mmu.c
> > > > > > > > > >
> > > > > > > > > >    132
> > > > > > > > > >    133  /*
> > > > > > > > > >    134   * Take VMA snapshot and pin vm_file and anon_name as they are used by
> > > > > > > > > >    135   * show_map_vma.
> > > > > > > > > >    136   */
> > > > > > > > > >    137  static int get_vma_snapshot(struct proc_maps_private *priv, struct vm_area_struct *vma)
> > > > > > > > > >    138  {
> > > > > > > > > >    139          struct vm_area_struct *copy = &priv->vma_copy;
> > > > > > > > > >    140          int ret = -EAGAIN;
> > > > > > > > > >    141
> > > > > > > > > >    142          memcpy(copy, vma, sizeof(*vma));
> > > > > > > > > >  > 143          if (copy->vm_file && !get_file_rcu(&copy->vm_file))
> > > > > > > > > >    144                  goto out;
> > > > > > > > > >    145
> > > > > > > > > >    146          if (!anon_vma_name_get_if_valid(copy))
> > > > > > > > > >    147                  goto put_file;
> > > > > > > > > >    148
> > > > > > > > > >    149          if (priv->mm_wr_seq == mmap_write_seq_read(priv->mm))
> > > > > > > > > >    150                  return 0;
> > > > > > > > > >    151
> > > > > > > > > >    152          /* Address space got modified, vma might be stale. Wait and retry. */
> > > > > > > > > >    153          rcu_read_unlock();
> > > > > > > > > >    154          ret = mmap_read_lock_killable(priv->mm);
> > > > > > > > > >    155          mmap_write_seq_record(priv->mm, &priv->mm_wr_seq);
> > > > > > > > > >    156          mmap_read_unlock(priv->mm);
> > > > > > > > > >    157          rcu_read_lock();
> > > > > > > > > >    158
> > > > > > > > > >    159          if (!ret)
> > > > > > > > > >    160                  ret = -EAGAIN; /* no other errors, ok to retry */
> > > > > > > > > >    161
> > > > > > > > > >    162          anon_vma_name_put_if_valid(copy);
> > > > > > > > > >    163  put_file:
> > > > > > > > > >    164          if (copy->vm_file)
> > > > > > > > > >    165                  fput(copy->vm_file);
> > > > > > > > > >    166  out:
> > > > > > > > > >    167          return ret;
> > > > > > > > > >    168  }
> > > > > > > > > >    169
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > 0-DAY CI Kernel Test Service
> > > > > > > > > > https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-01-27 19:17 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-25 10:22 [linux-next:master 1589/1892] fs/proc/task_mmu.c:143:45: sparse: sparse: incorrect type in argument 1 (different address spaces) kernel test robot
2024-01-25 21:27 ` Suren Baghdasaryan
2024-01-25 22:44   ` Paul E. McKenney
2024-01-25 23:17     ` Suren Baghdasaryan
2024-01-26  0:35       ` Paul E. McKenney
2024-01-26  2:24         ` Suren Baghdasaryan
2024-01-26 10:22           ` Paul E. McKenney
2024-01-26 16:55             ` Suren Baghdasaryan
2024-01-26 19:27               ` Paul E. McKenney
2024-01-27 18:02                 ` Suren Baghdasaryan
2024-01-27 19:17                   ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.