All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm: workingset: replace IRQ-off check with a lockdep assert.
@ 2019-02-11  9:57 Sebastian Andrzej Siewior
  2019-02-11 11:38 ` [PATCH v2] " Sebastian Andrzej Siewior
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-02-11  9:57 UTC (permalink / raw)
  To: linux-mm; +Cc: Johannes Weiner, Peter Zijlstra, Andrew Morton, Thomas Gleixner

Commit

  68d48e6a2df57 ("mm: workingset: add vmstat counter for shadow nodes")

introduced an IRQ-off check to ensure that a lock is held which also
disabled interrupts. This does not work the same way on -RT because none
of the locks, that are held, disable interrupts.
Replace this check with a lockdep assert which ensures that the lock is
held.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 mm/workingset.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/workingset.c b/mm/workingset.c
index dcb994f2acc2e..c75d10d48be16 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -368,6 +368,8 @@ static struct list_lru shadow_nodes;
 
 void workingset_update_node(struct xa_node *node)
 {
+	struct address_space *mapping;
+
 	/*
 	 * Track non-empty nodes that contain only shadow entries;
 	 * unlink those that contain pages or are being freed.
@@ -376,7 +378,8 @@ void workingset_update_node(struct xa_node *node)
 	 * already where they should be. The list_empty() test is safe
 	 * as node->private_list is protected by the i_pages lock.
 	 */
-	VM_WARN_ON_ONCE(!irqs_disabled());  /* For __inc_lruvec_page_state */
+	mapping = container_of(node->array, struct address_space, i_pages);
+	lockdep_is_held(&mapping->i_pages.xa_lock);
 
 	if (node->count && node->count == node->nr_values) {
 		if (list_empty(&node->private_list)) {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2] mm: workingset: replace IRQ-off check with a lockdep assert.
  2019-02-11  9:57 [PATCH] mm: workingset: replace IRQ-off check with a lockdep assert Sebastian Andrzej Siewior
@ 2019-02-11 11:38 ` Sebastian Andrzej Siewior
  2019-02-11 18:53   ` Johannes Weiner
  2019-02-11 17:07 ` [PATCH] " kbuild test robot
  2019-02-11 17:37 ` kbuild test robot
  2 siblings, 1 reply; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-02-11 11:38 UTC (permalink / raw)
  To: linux-mm; +Cc: Johannes Weiner, Peter Zijlstra, Andrew Morton, Thomas Gleixner

Commit

  68d48e6a2df57 ("mm: workingset: add vmstat counter for shadow nodes")

introduced an IRQ-off check to ensure that a lock is held which also
disabled interrupts. This does not work the same way on -RT because none
of the locks, that are held, disable interrupts.
Replace this check with a lockdep assert which ensures that the lock is
held.

Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
v1…v2: lockdep_is_held() => lockdep_assert_held()

 mm/workingset.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -368,6 +368,8 @@ static struct list_lru shadow_nodes;
 
 void workingset_update_node(struct xa_node *node)
 {
+	struct address_space *mapping;
+
 	/*
 	 * Track non-empty nodes that contain only shadow entries;
 	 * unlink those that contain pages or are being freed.
@@ -376,7 +378,8 @@ void workingset_update_node(struct xa_no
 	 * already where they should be. The list_empty() test is safe
 	 * as node->private_list is protected by the i_pages lock.
 	 */
-	VM_WARN_ON_ONCE(!irqs_disabled());  /* For __inc_lruvec_page_state */
+	mapping = container_of(node->array, struct address_space, i_pages);
+	lockdep_assert_held(&mapping->i_pages.xa_lock);
 
 	if (node->count && node->count == node->nr_values) {
 		if (list_empty(&node->private_list)) {


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: workingset: replace IRQ-off check with a lockdep assert.
  2019-02-11  9:57 [PATCH] mm: workingset: replace IRQ-off check with a lockdep assert Sebastian Andrzej Siewior
  2019-02-11 11:38 ` [PATCH v2] " Sebastian Andrzej Siewior
@ 2019-02-11 17:07 ` kbuild test robot
  2019-02-11 17:37 ` kbuild test robot
  2 siblings, 0 replies; 13+ messages in thread
From: kbuild test robot @ 2019-02-11 17:07 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: kbuild-all, linux-mm, Johannes Weiner, Peter Zijlstra,
	Andrew Morton, Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 2448 bytes --]

Hi Sebastian,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.0-rc4 next-20190211]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Sebastian-Andrzej-Siewior/mm-workingset-replace-IRQ-off-check-with-a-lockdep-assert/20190212-001418
config: openrisc-or1ksim_defconfig (attached as .config)
compiler: or1k-linux-gcc (GCC) 6.0.0 20160327 (experimental)
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=openrisc 

All errors (new ones prefixed by >>):

   mm/workingset.c: In function 'workingset_update_node':
>> mm/workingset.c:382:2: error: implicit declaration of function 'lockdep_is_held' [-Werror=implicit-function-declaration]
     lockdep_is_held(&mapping->i_pages.xa_lock);
     ^~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/lockdep_is_held +382 mm/workingset.c

   368	
   369	void workingset_update_node(struct xa_node *node)
   370	{
   371		struct address_space *mapping;
   372	
   373		/*
   374		 * Track non-empty nodes that contain only shadow entries;
   375		 * unlink those that contain pages or are being freed.
   376		 *
   377		 * Avoid acquiring the list_lru lock when the nodes are
   378		 * already where they should be. The list_empty() test is safe
   379		 * as node->private_list is protected by the i_pages lock.
   380		 */
   381		mapping = container_of(node->array, struct address_space, i_pages);
 > 382		lockdep_is_held(&mapping->i_pages.xa_lock);
   383	
   384		if (node->count && node->count == node->nr_values) {
   385			if (list_empty(&node->private_list)) {
   386				list_lru_add(&shadow_nodes, &node->private_list);
   387				__inc_lruvec_page_state(virt_to_page(node),
   388							WORKINGSET_NODES);
   389			}
   390		} else {
   391			if (!list_empty(&node->private_list)) {
   392				list_lru_del(&shadow_nodes, &node->private_list);
   393				__dec_lruvec_page_state(virt_to_page(node),
   394							WORKINGSET_NODES);
   395			}
   396		}
   397	}
   398	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 7863 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: workingset: replace IRQ-off check with a lockdep assert.
  2019-02-11  9:57 [PATCH] mm: workingset: replace IRQ-off check with a lockdep assert Sebastian Andrzej Siewior
  2019-02-11 11:38 ` [PATCH v2] " Sebastian Andrzej Siewior
  2019-02-11 17:07 ` [PATCH] " kbuild test robot
@ 2019-02-11 17:37 ` kbuild test robot
  2 siblings, 0 replies; 13+ messages in thread
From: kbuild test robot @ 2019-02-11 17:37 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: kbuild-all, linux-mm, Johannes Weiner, Peter Zijlstra,
	Andrew Morton, Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 2324 bytes --]

Hi Sebastian,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.0-rc4 next-20190211]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Sebastian-Andrzej-Siewior/mm-workingset-replace-IRQ-off-check-with-a-lockdep-assert/20190212-001418
config: i386-randconfig-x076-201906 (attached as .config)
compiler: gcc-8 (Debian 8.2.0-20) 8.2.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   mm/workingset.c: In function 'workingset_update_node':
>> mm/workingset.c:382:2: error: implicit declaration of function 'lockdep_is_held'; did you mean 'lockdep_is_held_type'? [-Werror=implicit-function-declaration]
     lockdep_is_held(&mapping->i_pages.xa_lock);
     ^~~~~~~~~~~~~~~
     lockdep_is_held_type
   cc1: some warnings being treated as errors

vim +382 mm/workingset.c

   368	
   369	void workingset_update_node(struct xa_node *node)
   370	{
   371		struct address_space *mapping;
   372	
   373		/*
   374		 * Track non-empty nodes that contain only shadow entries;
   375		 * unlink those that contain pages or are being freed.
   376		 *
   377		 * Avoid acquiring the list_lru lock when the nodes are
   378		 * already where they should be. The list_empty() test is safe
   379		 * as node->private_list is protected by the i_pages lock.
   380		 */
   381		mapping = container_of(node->array, struct address_space, i_pages);
 > 382		lockdep_is_held(&mapping->i_pages.xa_lock);
   383	
   384		if (node->count && node->count == node->nr_values) {
   385			if (list_empty(&node->private_list)) {
   386				list_lru_add(&shadow_nodes, &node->private_list);
   387				__inc_lruvec_page_state(virt_to_page(node),
   388							WORKINGSET_NODES);
   389			}
   390		} else {
   391			if (!list_empty(&node->private_list)) {
   392				list_lru_del(&shadow_nodes, &node->private_list);
   393				__dec_lruvec_page_state(virt_to_page(node),
   394							WORKINGSET_NODES);
   395			}
   396		}
   397	}
   398	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 33357 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] mm: workingset: replace IRQ-off check with a lockdep assert.
  2019-02-11 11:38 ` [PATCH v2] " Sebastian Andrzej Siewior
@ 2019-02-11 18:53   ` Johannes Weiner
  2019-02-11 19:13     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 13+ messages in thread
From: Johannes Weiner @ 2019-02-11 18:53 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-mm, Peter Zijlstra, Andrew Morton, Thomas Gleixner

On Mon, Feb 11, 2019 at 12:38:29PM +0100, Sebastian Andrzej Siewior wrote:
> Commit
> 
>   68d48e6a2df57 ("mm: workingset: add vmstat counter for shadow nodes")
> 
> introduced an IRQ-off check to ensure that a lock is held which also
> disabled interrupts. This does not work the same way on -RT because none
> of the locks, that are held, disable interrupts.
> Replace this check with a lockdep assert which ensures that the lock is
> held.
> 
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

I'm not against checking for the lock, but if IRQs aren't disabled,
what ensures __mod_lruvec_state() is safe? I'm guessing it's because
preemption is disabled and irq handlers are punted to process context.

That said, it seems weird to me that

	spin_lock_irqsave();
	BUG_ON(!irqs_disabled());
	spin_unlock_irqrestore();

would trigger. Wouldn't it make sense to have a raw_irqs_disabled() or
something and keep the irqs_disabled() abstraction layer intact?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] mm: workingset: replace IRQ-off check with a lockdep assert.
  2019-02-11 18:53   ` Johannes Weiner
@ 2019-02-11 19:13     ` Sebastian Andrzej Siewior
  2019-02-11 19:17       ` Matthew Wilcox
  2019-02-11 21:02       ` Johannes Weiner
  0 siblings, 2 replies; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-02-11 19:13 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: linux-mm, Peter Zijlstra, Andrew Morton, Thomas Gleixner

On 2019-02-11 13:53:18 [-0500], Johannes Weiner wrote:
> On Mon, Feb 11, 2019 at 12:38:29PM +0100, Sebastian Andrzej Siewior wrote:
> > Commit
> > 
> >   68d48e6a2df57 ("mm: workingset: add vmstat counter for shadow nodes")
> > 
> > introduced an IRQ-off check to ensure that a lock is held which also
> > disabled interrupts. This does not work the same way on -RT because none
> > of the locks, that are held, disable interrupts.
> > Replace this check with a lockdep assert which ensures that the lock is
> > held.
> > 
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> 
> I'm not against checking for the lock, but if IRQs aren't disabled,
> what ensures __mod_lruvec_state() is safe?

how do you define safe? I've been looking for dependencies of
__mod_lruvec_state() but found only that the lock is held during the RMW
operation with WORKINGSET_NODES idx.

>                                            I'm guessing it's because
> preemption is disabled and irq handlers are punted to process context.
preemption is enabled and IRQ are processed in forced-threaded mode.

> That said, it seems weird to me that
> 
> 	spin_lock_irqsave();
> 	BUG_ON(!irqs_disabled());
> 	spin_unlock_irqrestore();
> 
> would trigger. Wouldn't it make sense to have a raw_irqs_disabled() or
> something and keep the irqs_disabled() abstraction layer intact?

maybe if I know why interrupts should be disabled in the first place.
The ->i_pages lock is never acquired with disabled interrupts so it
should be safe to proceed as-is. Should there be a spot in -RT where the
lock is acquired with disabled interrupts then lockdep would scream. And
then we would have to decide to either move everything raw_ locks (and
live with the consequences) or avoid acquiring the lock with disabled
interrupts.

Sebastian


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] mm: workingset: replace IRQ-off check with a lockdep assert.
  2019-02-11 19:13     ` Sebastian Andrzej Siewior
@ 2019-02-11 19:17       ` Matthew Wilcox
  2019-02-11 19:41         ` Sebastian Andrzej Siewior
  2019-02-11 21:02       ` Johannes Weiner
  1 sibling, 1 reply; 13+ messages in thread
From: Matthew Wilcox @ 2019-02-11 19:17 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Johannes Weiner, linux-mm, Peter Zijlstra, Andrew Morton,
	Thomas Gleixner

On Mon, Feb 11, 2019 at 08:13:45PM +0100, Sebastian Andrzej Siewior wrote:
> On 2019-02-11 13:53:18 [-0500], Johannes Weiner wrote:
> > I'm not against checking for the lock, but if IRQs aren't disabled,
> > what ensures __mod_lruvec_state() is safe?
> 
> how do you define safe? I've been looking for dependencies of
> __mod_lruvec_state() but found only that the lock is held during the RMW
> operation with WORKINGSET_NODES idx.
> 
> >                                            I'm guessing it's because
> > preemption is disabled and irq handlers are punted to process context.
> preemption is enabled and IRQ are processed in forced-threaded mode.
> 
> > That said, it seems weird to me that
> > 
> > 	spin_lock_irqsave();
> > 	BUG_ON(!irqs_disabled());
> > 	spin_unlock_irqrestore();
> > 
> > would trigger. Wouldn't it make sense to have a raw_irqs_disabled() or
> > something and keep the irqs_disabled() abstraction layer intact?
> 
> maybe if I know why interrupts should be disabled in the first place.
> The ->i_pages lock is never acquired with disabled interrupts so it
> should be safe to proceed as-is. Should there be a spot in -RT where the
> lock is acquired with disabled interrupts then lockdep would scream. And
> then we would have to decide to either move everything raw_ locks (and
> live with the consequences) or avoid acquiring the lock with disabled
> interrupts.

I think you mean 'the i_pages lock is never acquired with interrupts
enabled".  Lockdep would scream if it were -- you'd be in a situation
where an interrupt handler which acquired the i_pages lock could deadlock
against you.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] mm: workingset: replace IRQ-off check with a lockdep assert.
  2019-02-11 19:17       ` Matthew Wilcox
@ 2019-02-11 19:41         ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-02-11 19:41 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Johannes Weiner, linux-mm, Peter Zijlstra, Andrew Morton,
	Thomas Gleixner

On 2019-02-11 11:17:45 [-0800], Matthew Wilcox wrote:
> On Mon, Feb 11, 2019 at 08:13:45PM +0100, Sebastian Andrzej Siewior wrote:
> > On 2019-02-11 13:53:18 [-0500], Johannes Weiner wrote:
> > > I'm not against checking for the lock, but if IRQs aren't disabled,
> > > what ensures __mod_lruvec_state() is safe?
> > 
> > how do you define safe? I've been looking for dependencies of
> > __mod_lruvec_state() but found only that the lock is held during the RMW
> > operation with WORKINGSET_NODES idx.
> > 
> > >                                            I'm guessing it's because
> > > preemption is disabled and irq handlers are punted to process context.
> > preemption is enabled and IRQ are processed in forced-threaded mode.
> > 
> > > That said, it seems weird to me that
> > > 
> > > 	spin_lock_irqsave();
> > > 	BUG_ON(!irqs_disabled());
> > > 	spin_unlock_irqrestore();
> > > 
> > > would trigger. Wouldn't it make sense to have a raw_irqs_disabled() or
> > > something and keep the irqs_disabled() abstraction layer intact?
> > 
> > maybe if I know why interrupts should be disabled in the first place.
> > The ->i_pages lock is never acquired with disabled interrupts so it
> > should be safe to proceed as-is. Should there be a spot in -RT where the
> > lock is acquired with disabled interrupts then lockdep would scream. And
> > then we would have to decide to either move everything raw_ locks (and
> > live with the consequences) or avoid acquiring the lock with disabled
> > interrupts.
> 
> I think you mean 'the i_pages lock is never acquired with interrupts
> enabled".  Lockdep would scream if it were -- you'd be in a situation
> where an interrupt handler which acquired the i_pages lock could deadlock
> against you.
With RT enabled the i_pages lock is always acquired with interrupts
enabled because spin_lock_irq() does not disable interrupts.

Sebastian


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] mm: workingset: replace IRQ-off check with a lockdep assert.
  2019-02-11 19:13     ` Sebastian Andrzej Siewior
  2019-02-11 19:17       ` Matthew Wilcox
@ 2019-02-11 21:02       ` Johannes Weiner
  2019-02-13  9:27         ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 13+ messages in thread
From: Johannes Weiner @ 2019-02-11 21:02 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-mm, Peter Zijlstra, Andrew Morton, Thomas Gleixner

On Mon, Feb 11, 2019 at 08:13:45PM +0100, Sebastian Andrzej Siewior wrote:
> On 2019-02-11 13:53:18 [-0500], Johannes Weiner wrote:
> > On Mon, Feb 11, 2019 at 12:38:29PM +0100, Sebastian Andrzej Siewior wrote:
> > > Commit
> > > 
> > >   68d48e6a2df57 ("mm: workingset: add vmstat counter for shadow nodes")
> > > 
> > > introduced an IRQ-off check to ensure that a lock is held which also
> > > disabled interrupts. This does not work the same way on -RT because none
> > > of the locks, that are held, disable interrupts.
> > > Replace this check with a lockdep assert which ensures that the lock is
> > > held.
> > > 
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > 
> > I'm not against checking for the lock, but if IRQs aren't disabled,
> > what ensures __mod_lruvec_state() is safe?
> 
> how do you define safe? I've been looking for dependencies of
> __mod_lruvec_state() but found only that the lock is held during the RMW
> operation with WORKINGSET_NODES idx.

These stat functions are not allowed to nest, and the executing thread
cannot migrate to another CPU during the operation, otherwise they
corrupt the state they're modifying.

They are called from interrupt handlers, such as when NR_WRITEBACK is
decreased. Thus workingset_node_update() must exclude preemption from
irq handlers on the local CPU.

They rely on IRQ-disabling to also disable CPU migration.

> >                                            I'm guessing it's because
> > preemption is disabled and irq handlers are punted to process context.
> preemption is enabled and IRQ are processed in forced-threaded mode.

That doesn't sound safe.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] mm: workingset: replace IRQ-off check with a lockdep assert.
  2019-02-11 21:02       ` Johannes Weiner
@ 2019-02-13  9:27         ` Sebastian Andrzej Siewior
  2019-02-13 14:56           ` Johannes Weiner
  0 siblings, 1 reply; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-02-13  9:27 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: linux-mm, Peter Zijlstra, Andrew Morton, Thomas Gleixner

On 2019-02-11 16:02:08 [-0500], Johannes Weiner wrote:
> > how do you define safe? I've been looking for dependencies of
> > __mod_lruvec_state() but found only that the lock is held during the RMW
> > operation with WORKINGSET_NODES idx.
> 
> These stat functions are not allowed to nest, and the executing thread
> cannot migrate to another CPU during the operation, otherwise they
> corrupt the state they're modifying.

If everyone is taking the same lock (like i_pages.xa_lock) then there
will not be two instances updating the same stat. The owner of the
(sleeping)-spinlock will not be migrated to another CPU.

> They are called from interrupt handlers, such as when NR_WRITEBACK is
> decreased. Thus workingset_node_update() must exclude preemption from
> irq handlers on the local CPU.

Do you have an example for a code path to check NR_WRITEBACK?
 
> They rely on IRQ-disabling to also disable CPU migration.
The spinlock disables CPU migration. 

> > >                                            I'm guessing it's because
> > > preemption is disabled and irq handlers are punted to process context.
> > preemption is enabled and IRQ are processed in forced-threaded mode.
> 
> That doesn't sound safe.

Do you have test-case or something I could throw at it and verify that
this still works? So far nothing complains…

Sebastian


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] mm: workingset: replace IRQ-off check with a lockdep assert.
  2019-02-13  9:27         ` Sebastian Andrzej Siewior
@ 2019-02-13 14:56           ` Johannes Weiner
  2019-08-21 11:21             ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 13+ messages in thread
From: Johannes Weiner @ 2019-02-13 14:56 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-mm, Peter Zijlstra, Andrew Morton, Thomas Gleixner

On Wed, Feb 13, 2019 at 10:27:54AM +0100, Sebastian Andrzej Siewior wrote:
> On 2019-02-11 16:02:08 [-0500], Johannes Weiner wrote:
> > > how do you define safe? I've been looking for dependencies of
> > > __mod_lruvec_state() but found only that the lock is held during the RMW
> > > operation with WORKINGSET_NODES idx.
> > 
> > These stat functions are not allowed to nest, and the executing thread
> > cannot migrate to another CPU during the operation, otherwise they
> > corrupt the state they're modifying.
> 
> If everyone is taking the same lock (like i_pages.xa_lock) then there
> will not be two instances updating the same stat. The owner of the
> (sleeping)-spinlock will not be migrated to another CPU.

This might be true for this particular stat item, but they are general
VM statistics. They're assuredly not all taking the xa_lock.

> > They are called from interrupt handlers, such as when NR_WRITEBACK is
> > decreased. Thus workingset_node_update() must exclude preemption from
> > irq handlers on the local CPU.
> 
> Do you have an example for a code path to check NR_WRITEBACK?

end_page_writeback()
 test_clear_page_writeback()
   dec_lruvec_state(lruvec, NR_WRITEBACK)

> > They rely on IRQ-disabling to also disable CPU migration.
> The spinlock disables CPU migration. 
> 
> > > >                                            I'm guessing it's because
> > > > preemption is disabled and irq handlers are punted to process context.
> > > preemption is enabled and IRQ are processed in forced-threaded mode.
> > 
> > That doesn't sound safe.
> 
> Do you have test-case or something I could throw at it and verify that
> this still works? So far nothing complains…

It's not easy to get the timing right on purpose, but we've seen in
production what happens when you don't protect these counter updates
from interrupts. See c3cc39118c36 ("mm: memcontrol: fix NR_WRITEBACK
leak in memcg and system stats").


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] mm: workingset: replace IRQ-off check with a lockdep assert.
  2019-02-13 14:56           ` Johannes Weiner
@ 2019-08-21 11:21             ` Sebastian Andrzej Siewior
  2019-08-21 15:21               ` Johannes Weiner
  0 siblings, 1 reply; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-08-21 11:21 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: linux-mm, Peter Zijlstra, Andrew Morton, Thomas Gleixner

sorry, I somehow forgot about this…

On 2019-02-13 09:56:56 [-0500], Johannes Weiner wrote:
> On Wed, Feb 13, 2019 at 10:27:54AM +0100, Sebastian Andrzej Siewior wrote:
> > On 2019-02-11 16:02:08 [-0500], Johannes Weiner wrote:
> > > > how do you define safe? I've been looking for dependencies of
> > > > __mod_lruvec_state() but found only that the lock is held during the RMW
> > > > operation with WORKINGSET_NODES idx.
> > > 
> > > These stat functions are not allowed to nest, and the executing thread
> > > cannot migrate to another CPU during the operation, otherwise they
> > > corrupt the state they're modifying.
> > 
> > If everyone is taking the same lock (like i_pages.xa_lock) then there
> > will not be two instances updating the same stat. The owner of the
> > (sleeping)-spinlock will not be migrated to another CPU.
> 
> This might be true for this particular stat item, but they are general
> VM statistics. They're assuredly not all taking the xa_lock.

This one in particular does and my guess is that the interrupts are
disabled here because of xa_lock. So the question is why should the
interrupts be disabled? Is this due to the lock that should have been
acquired (and as such disable interrupts) _or_ because of the
*_lruvec_slab_state() operation.

> > > They are called from interrupt handlers, such as when NR_WRITEBACK is
> > > decreased. Thus workingset_node_update() must exclude preemption from
> > > irq handlers on the local CPU.
> > 
> > Do you have an example for a code path to check NR_WRITEBACK?
> 
> end_page_writeback()
>  test_clear_page_writeback()
>    dec_lruvec_state(lruvec, NR_WRITEBACK)

So with a warning in dec_lruvec_state() I found only a call path from
softirq (like scsi_io_completion() / bio_endio()). Having lockdep
annotation instead "just" preempt_disable() would have helped :)

> > > They rely on IRQ-disabling to also disable CPU migration.
> > The spinlock disables CPU migration. 
> > 
> > > > >                                            I'm guessing it's because
> > > > > preemption is disabled and irq handlers are punted to process context.
> > > > preemption is enabled and IRQ are processed in forced-threaded mode.
> > > 
> > > That doesn't sound safe.
> > 
> > Do you have test-case or something I could throw at it and verify that
> > this still works? So far nothing complains…
> 
> It's not easy to get the timing right on purpose, but we've seen in
> production what happens when you don't protect these counter updates
> from interrupts. See c3cc39118c36 ("mm: memcontrol: fix NR_WRITEBACK
> leak in memcg and system stats").

Based on the looking code I'm looking at, it looks fine. Should I just
resubmit the patch?

Sebastian


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] mm: workingset: replace IRQ-off check with a lockdep assert.
  2019-08-21 11:21             ` Sebastian Andrzej Siewior
@ 2019-08-21 15:21               ` Johannes Weiner
  0 siblings, 0 replies; 13+ messages in thread
From: Johannes Weiner @ 2019-08-21 15:21 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-mm, Peter Zijlstra, Andrew Morton, Thomas Gleixner

On Wed, Aug 21, 2019 at 01:21:16PM +0200, Sebastian Andrzej Siewior wrote:
> sorry, I somehow forgot about this…
> 
> On 2019-02-13 09:56:56 [-0500], Johannes Weiner wrote:
> > On Wed, Feb 13, 2019 at 10:27:54AM +0100, Sebastian Andrzej Siewior wrote:
> > > On 2019-02-11 16:02:08 [-0500], Johannes Weiner wrote:
> > > > > how do you define safe? I've been looking for dependencies of
> > > > > __mod_lruvec_state() but found only that the lock is held during the RMW
> > > > > operation with WORKINGSET_NODES idx.
> > > > 
> > > > These stat functions are not allowed to nest, and the executing thread
> > > > cannot migrate to another CPU during the operation, otherwise they
> > > > corrupt the state they're modifying.
> > > 
> > > If everyone is taking the same lock (like i_pages.xa_lock) then there
> > > will not be two instances updating the same stat. The owner of the
> > > (sleeping)-spinlock will not be migrated to another CPU.
> > 
> > This might be true for this particular stat item, but they are general
> > VM statistics. They're assuredly not all taking the xa_lock.
> 
> This one in particular does and my guess is that the interrupts are
> disabled here because of xa_lock. So the question is why should the
> interrupts be disabled? Is this due to the lock that should have been
> acquired (and as such disable interrupts) _or_ because of the
> *_lruvec_slab_state() operation.
> 
> > > > They are called from interrupt handlers, such as when NR_WRITEBACK is
> > > > decreased. Thus workingset_node_update() must exclude preemption from
> > > > irq handlers on the local CPU.
> > > 
> > > Do you have an example for a code path to check NR_WRITEBACK?
> > 
> > end_page_writeback()
> >  test_clear_page_writeback()
> >    dec_lruvec_state(lruvec, NR_WRITEBACK)
> 
> So with a warning in dec_lruvec_state() I found only a call path from
> softirq (like scsi_io_completion() / bio_endio()). Having lockdep
> annotation instead "just" preempt_disable() would have helped :)
> 
> > > > They rely on IRQ-disabling to also disable CPU migration.
> > > The spinlock disables CPU migration. 
> > > 
> > > > > >                                            I'm guessing it's because
> > > > > > preemption is disabled and irq handlers are punted to process context.
> > > > > preemption is enabled and IRQ are processed in forced-threaded mode.
> > > > 
> > > > That doesn't sound safe.
> > > 
> > > Do you have test-case or something I could throw at it and verify that
> > > this still works? So far nothing complains…
> > 
> > It's not easy to get the timing right on purpose, but we've seen in
> > production what happens when you don't protect these counter updates
> > from interrupts. See c3cc39118c36 ("mm: memcontrol: fix NR_WRITEBACK
> > leak in memcg and system stats").
> 
> Based on the looking code I'm looking at, it looks fine. Should I just
> resubmit the patch?

No, NAK to this patch and others like it for the mm code.

The serialization scheme for the vmstats facilty is that stats can be
modified from interrupt context, and so they rely on interrupts being
disabled. This check is correct.

If you want to comprehensively change the scheme, you're of course
welcome to propose that, and I won't be in your way. But that includes
review and update of *all* participants, from the mutation points that
disable irqs (mod_zone_page_state() and friends) to the execution
context of all callstacks, including the full block layer.

What we are NOT doing is eliminating checks that correctly verify the
current locking scheme. We've seen race conditions in this code that
took millions of machine hours to trigger when the rules were broken,
so we rely on explicit checks during code development. It's also not
surprising that they're the only thing that triggers in your testing.

Making this work correctly for RT needs a more thoughtful approach.


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-08-21 15:22 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-11  9:57 [PATCH] mm: workingset: replace IRQ-off check with a lockdep assert Sebastian Andrzej Siewior
2019-02-11 11:38 ` [PATCH v2] " Sebastian Andrzej Siewior
2019-02-11 18:53   ` Johannes Weiner
2019-02-11 19:13     ` Sebastian Andrzej Siewior
2019-02-11 19:17       ` Matthew Wilcox
2019-02-11 19:41         ` Sebastian Andrzej Siewior
2019-02-11 21:02       ` Johannes Weiner
2019-02-13  9:27         ` Sebastian Andrzej Siewior
2019-02-13 14:56           ` Johannes Weiner
2019-08-21 11:21             ` Sebastian Andrzej Siewior
2019-08-21 15:21               ` Johannes Weiner
2019-02-11 17:07 ` [PATCH] " kbuild test robot
2019-02-11 17:37 ` kbuild test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.