All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] mm: move xa forward when run across zombie page
@ 2022-10-14  5:30 zhaoyang.huang
  2022-10-14 12:11 ` Matthew Wilcox
  0 siblings, 1 reply; 35+ messages in thread
From: zhaoyang.huang @ 2022-10-14  5:30 UTC (permalink / raw)
  To: Andrew Morton, Matthew Wilcox, Zhaoyang Huang, linux-mm,
	linux-kernel, ke.wang, steve.kang, baocong.liu, linux-fsdevel

From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>

Bellowing RCU stall is reported where kswapd traps in a live lock when shrink
superblock's inode list. The direct reason is zombie page keeps staying on the
xarray's slot and make the check and retry loop permanently. The root cause is unknown yet
and supposed could be an xa update without synchronize_rcu etc. I would like to
suggest skip this page to break the live lock as a workaround.

[167222.620296] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[167285.640296] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[167348.660296] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[167411.680296] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[167474.700296] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[167537.720299] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[167600.740296] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[167663.760298] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[167726.780298] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[167789.800297] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[167726.780305] rcu: Tasks blocked on level-0 rcu_node (CPUs 0-7): P155
[167726.780319] (detected by 3, t=17256977 jiffies, g=19883597, q=2397394)
[167726.780325] task:kswapd0         state:R  running task     stack:   24 pid:  155 ppid:     2 flags:0x00000008
[167789.800308] rcu: Tasks blocked on level-0 rcu_node (CPUs 0-7): P155
[167789.800322] (detected by 3, t=17272732 jiffies, g=19883597, q=2397470)
[167789.800328] task:kswapd0         state:R  running task     stack:   24 pid:  155 ppid:     2 flags:0x00000008
[167789.800339] Call trace:
[167789.800342]  dump_backtrace.cfi_jt+0x0/0x8
[167789.800355]  show_stack+0x1c/0x2c
[167789.800363]  sched_show_task+0x1ac/0x27c
[167789.800370]  print_other_cpu_stall+0x314/0x4dc
[167789.800377]  check_cpu_stall+0x1c4/0x36c
[167789.800382]  rcu_sched_clock_irq+0xe8/0x388
[167789.800389]  update_process_times+0xa0/0xe0
[167789.800396]  tick_sched_timer+0x7c/0xd4
[167789.800404]  __run_hrtimer+0xd8/0x30c
[167789.800408]  hrtimer_interrupt+0x1e4/0x2d0
[167789.800414]  arch_timer_handler_phys+0x5c/0xa0
[167789.800423]  handle_percpu_devid_irq+0xbc/0x318
[167789.800430]  handle_domain_irq+0x7c/0xf0
[167789.800437]  gic_handle_irq+0x54/0x12c
[167789.800445]  call_on_irq_stack+0x40/0x70
[167789.800451]  do_interrupt_handler+0x44/0xa0
[167789.800457]  el1_interrupt+0x34/0x64
[167789.800464]  el1h_64_irq_handler+0x1c/0x2c
[167789.800470]  el1h_64_irq+0x7c/0x80
[167789.800474]  xas_find+0xb4/0x28c
[167789.800481]  find_get_entry+0x3c/0x178
[167789.800487]  find_lock_entries+0x98/0x2f8
[167789.800492]  __invalidate_mapping_pages.llvm.3657204692649320853+0xc8/0x224
[167789.800500]  invalidate_mapping_pages+0x18/0x28
[167789.800506]  inode_lru_isolate+0x140/0x2a4
[167789.800512]  __list_lru_walk_one+0xd8/0x204
[167789.800519]  list_lru_walk_one+0x64/0x90
[167789.800524]  prune_icache_sb+0x54/0xe0
[167789.800529]  super_cache_scan+0x160/0x1ec
[167789.800535]  do_shrink_slab+0x20c/0x5c0
[167789.800541]  shrink_slab+0xf0/0x20c
[167789.800546]  shrink_node_memcgs+0x98/0x320
[167789.800553]  shrink_node+0xe8/0x45c
[167789.800557]  balance_pgdat+0x464/0x814
[167789.800563]  kswapd+0xfc/0x23c
[167789.800567]  kthread+0x164/0x1c8
[167789.800573]  ret_from_fork+0x10/0x20

Signed-off-by: Baocong Liu <baocong.liu@unisoc.com>
Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
---
 mm/filemap.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 15800334..25b0a2e 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2019,8 +2019,10 @@ static inline struct folio *find_get_entry(struct xa_state *xas, pgoff_t max,
 	if (!folio || xa_is_value(folio))
 		return folio;
 
-	if (!folio_try_get_rcu(folio))
+	if (!folio_try_get_rcu(folio)) {
+		xas_advance(xas, folio->index + folio_nr_pages(folio) - 1);
 		goto reset;
+	}
 
 	if (unlikely(folio != xas_reload(xas))) {
 		folio_put(folio);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread
* Re: [RFC PATCH] mm: move xa forward when run across zombie page
@ 2022-10-21 21:37 Pulavarty, Badari
  2022-10-21 22:31 ` Matthew Wilcox
  0 siblings, 1 reply; 35+ messages in thread
From: Pulavarty, Badari @ 2022-10-21 21:37 UTC (permalink / raw)
  To: david
  Cc: akpm, bfoster, huangzhaoyang, ke.wang, linux-fsdevel,
	inux-kernel, linux-mm, willy, zhaoyang.huang, Shutemov, Kirill,
	Tang, Feng, Huang, Ying, Yin, Fengwei, Hansen, Dave, Zanussi,
	Tom

Hi All,

I have been tracking similar issue(s) with soft lockup or panics on my system consistently with my workload.
Tried multiple kernel versions. Issue seem to happen consistently on 6.1-rc1 (while it seem to happen on 5.17, 5.19, 6.0.X)

PANIC: "Kernel panic - not syncing: softlockup: hung tasks"

    RIP: 0000000000000001  RSP: ff3d8e7f0d9978ea  RFLAGS: ff3d8e7f0d9978e8
    RAX: 0000000000000000  RBX: 0000000000000000  RCX: 0000000000000000
    RDX: 000000006b9c66f1  RSI: ff506ca15ff33c20  RDI: 0000000000000000
    RBP: ffffffff84bc64cc   R8: ff3d8e412cabdff0   R9: ffffffff84c00e8b
    R10: ff506ca15ff33b69  R11: 0000000000000000  R12: ff506ca15ff33b58
    R13: ffffffff84bc79a3  R14: ff506ca15ff33b38  R15: 0000000000000000
    ORIG_RAX: ff506ca15ff33a80  CS: ff506ca15ff33c78  SS: 0000
#9 [ff506ca15ff33c18] xas_load at ffffffff84b49a7f
#10 [ff506ca15ff33c28] __filemap_get_folio at ffffffff840985da
#11 [ff506ca15ff33ce8] swap_cache_get_folio at ffffffff841119db
#12 [ff506ca15ff33d18] do_swap_page at ffffffff840dbd21
#13 [ff506ca15ff33db8] __handle_mm_fault at ffffffff840ddee3
#14 [ff506ca15ff33e88] handle_mm_fault at ffffffff840de55d
#15 [ff506ca15ff33ec8] do_user_addr_fault at ffffffff83e93247
#16 [ff506ca15ff33f20] exc_page_fault at ffffffff84bc711d
#17 [ff506ca15ff33f50] asm_exc_page_fault at ffffffff84c00b77

Tried various patches proposed on this thread chain.. but no luck so far.

Looks like its stuck in following loop forever causing softlockup/panic:

 if (!folio_try_get_rcu(folio)) 
                goto repeat;

Looking at the crash dump, mapping->host became NULL. Not sure what exactly is happening.
Welcome any ideas to track it down further.

struct address_space {
  host = 0x0,
  i_pages = {
    xa_lock = {
      {
        rlock = {
          raw_lock = {
            {
              val = {
                counter = 0
              },
              {
                locked = 0 '\000',
                pending = 0 '\000'
              },
              {
                locked_pending = 0,
                tail = 0
              }
            }
          }
        }
      }
    },
    xa_flags = 1,
    xa_head = 0xff3d8e7f9ca41daa
  },
  invalidate_lock = {
    count = {
      counter = 0
    },
    owner = {
      counter = 0
    },
    osq = {
      tail = {
        counter = 0
      }
    },
    wait_lock = {
      raw_lock = {
        {
          val = {
            counter = 0
          },
          {
            locked = 0 '\000',
            pending = 0 '\000'
          },
          {
            locked_pending = 0,
            tail = 0
          }
        }
      }
    },
    wait_list = {
      next = 0x0,
      prev = 0x0
    }
  },
  gfp_mask = 0,
  i_mmap_writable = {
    counter = 0
  },
  i_mmap = {
    rb_root = {
      rb_node = 0x0
    },
    rb_leftmost = 0x0
  },
  i_mmap_rwsem = {
    count = {
      counter = 0
    },
    owner = {
      counter = 0
    },
    osq = {
      tail = {
        counter = 0
      }
    },
    wait_lock = {
      raw_lock = {
        {
          val = {
            counter = 0
          },
          {
            locked = 0 '\000',
            pending = 0 '\000'
          },
          {
            locked_pending = 0,
            tail = 0
          }
        }
      }
    },
    wait_list = {
      next = 0x0,
      prev = 0x0
    }
  },
  nrpages = 1897,
  writeback_index = 0,
  a_ops = 0xffffffff85044560,
  flags = 32,
  wb_err = 0,
  private_lock = {
    {
      rlock = {
        raw_lock = {
          {
            val = {
              counter = 0
            },
            {
              locked = 0 '\000',
              pending = 0 '\000'
            },
            {
              locked_pending = 0,
              tail = 0
            }
          }
        }
      }
    }
  },
  private_list = {
    next = 0x0,
    prev = 0x0
  },
  private_data = 0x0
}



Thanks,
Badari

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2024-04-11  7:04 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-14  5:30 [RFC PATCH] mm: move xa forward when run across zombie page zhaoyang.huang
2022-10-14 12:11 ` Matthew Wilcox
2022-10-17  5:34   ` Zhaoyang Huang
2022-10-17  6:58     ` Zhaoyang Huang
2022-10-17 15:55     ` Matthew Wilcox
2022-10-18  2:52       ` Zhaoyang Huang
2022-10-18  3:09         ` Matthew Wilcox
2022-10-18 22:30           ` Dave Chinner
2022-10-19  1:16             ` Dave Chinner
2022-10-19  4:47               ` Dave Chinner
2022-10-19  5:48                 ` Zhaoyang Huang
2022-10-19 13:06                   ` Matthew Wilcox
2022-10-20  1:27                     ` Zhaoyang Huang
2022-10-26 19:49                   ` Matthew Wilcox
2022-10-27  1:57                     ` Zhaoyang Huang
2022-10-19 11:49             ` Brian Foster
2022-10-20  2:04               ` Dave Chinner
2022-10-20  3:12                 ` Zhaoyang Huang
2022-10-19 15:23             ` Matthew Wilcox
2022-10-19 22:04               ` Dave Chinner
2022-10-19 22:46                 ` Dave Chinner
2022-10-19 23:42                   ` Dave Chinner
2022-10-20 21:52                 ` Matthew Wilcox
2022-10-26  8:38                   ` Zhaoyang Huang
2022-10-26 14:38                     ` Matthew Wilcox
2022-10-26 16:01                   ` Matthew Wilcox
2022-10-28  4:05                     ` Dave Chinner
2022-11-01  7:17                   ` Dave Chinner
2024-04-11  7:04                     ` Zhaoyang Huang
2022-10-21 21:37 Pulavarty, Badari
2022-10-21 22:31 ` Matthew Wilcox
2022-10-21 22:40   ` Pulavarty, Badari
2022-10-31 19:25   ` Pulavarty, Badari
2022-10-31 19:39     ` Hugh Dickins
2022-10-31 21:33       ` Pulavarty, Badari

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.