All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pankaj Gupta <pagupta@redhat.com>
To: Tariq Toukan <ttoukan.linux@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	Tariq Toukan <tariqt@mellanox.com>,
	netdev@vger.kernel.org, akpm@linux-foundation.org,
	linux-mm <linux-mm@kvack.org>,
	Saeed Mahameed <saeedm@mellanox.com>
Subject: Re: Page allocator order-0 optimizations merged
Date: Mon, 27 Mar 2017 03:32:47 -0400 (EDT)	[thread overview]
Message-ID: <2123321554.7161128.1490599967015.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <1fc7338f-2b36-75f7-8a7e-8321f062207b@gmail.com>


Hello,

It looks like a race with softirq and normal process context.

Just thinking if we really want allocations from 'softirqs' to be done using 
per cpu list? Or we can have some check in  'free_hot_cold_page' for softirqs 
to check if we are on a path of returning from hard interrupt don't allocate 
from per cpu list.

Thanks,
Pankaj

> 
> 
> 
> On 26/03/2017 11:21 AM, Tariq Toukan wrote:
> >
> >
> > On 23/03/2017 4:51 PM, Mel Gorman wrote:
> >> On Thu, Mar 23, 2017 at 02:43:47PM +0100, Jesper Dangaard Brouer wrote:
> >>> On Wed, 22 Mar 2017 23:40:04 +0000
> >>> Mel Gorman <mgorman@techsingularity.net> wrote:
> >>>
> >>>> On Wed, Mar 22, 2017 at 07:39:17PM +0200, Tariq Toukan wrote:
> >>>>>>>> This modification may slow allocations from IRQ context slightly
> >>>>>>>> but the
> >>>>>>>> main gain from the per-cpu allocator is that it scales better for
> >>>>>>>> allocations from multiple contexts.  There is an implicit
> >>>>>>>> assumption that
> >>>>>>>> intensive allocations from IRQ contexts on multiple CPUs from a
> >>>>>>>> single
> >>>>>>>> NUMA node are rare
> >>>>> Hi Mel, Jesper, and all.
> >>>>>
> >>>>> This assumption contradicts regular multi-stream traffic that is
> >>>>> naturally
> >>>>> handled
> >>>>> over close numa cores.  I compared iperf TCP multistream (8 streams)
> >>>>> over CX4 (mlx5 driver) with kernels v4.10 (before this series) vs
> >>>>> kernel v4.11-rc1 (with this series).
> >>>>> I disabled the page-cache (recycle) mechanism to stress the page
> >>>>> allocator,
> >>>>> and see a drastic degradation in BW, from 47.5 G in v4.10 to 31.4 G in
> >>>>> v4.11-rc1 (34% drop).
> >>>>> I noticed queued_spin_lock_slowpath occupies 62.87% of CPU time.
> >>>>
> >>>> Can you get the stack trace for the spin lock slowpath to confirm it's
> >>>> from IRQ context?
> >>>
> >>> AFAIK allocations happen in softirq.  Argh and during review I missed
> >>> that in_interrupt() also covers softirq.  To Mel, can we use a in_irq()
> >>> check instead?
> >>>
> >>> (p.s. just landed and got home)
> >
> > Glad to hear. Thanks for your suggestion.
> >
> >>
> >> Not built or even boot tested. I'm unable to run tests at the moment
> >
> > Thanks Mel, I will test it soon.
> >
> Crashed in iperf single stream test:
> 
> [ 3974.123386] ------------[ cut here ]------------
> [ 3974.128778] WARNING: CPU: 2 PID: 8754 at lib/list_debug.c:53
> __list_del_entry_valid+0xa3/0xd0
> [ 3974.138751] list_del corruption. prev->next should be
> ffffea0040369c60, but was dead000000000100
> [ 3974.149016] Modules linked in: netconsole nfsv3 nfs fscache dm_mirror
> dm_region_hash dm_log dm_mod sb_edac edac_core x86_pkg_temp_thermal
> coretemp i2c_diolan_u2c kvm irqbypass ipmi_si ipmi_devintf crc32_pclmul
> iTCO_wdt ghash_clmulni_intel ipmi_msghandler dcdbas iTCO_vendor_support
> sg pcspkr lpc_ich shpchp wmi mfd_core acpi_power_meter nfsd auth_rpcgss
> nfs_acl lockd grace sunrpc binfmt_misc ip_tables mlx4_en sr_mod cdrom
> sd_mod i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
> fb_sys_fops mlx5_core ttm tg3 ahci libahci mlx4_core drm libata ptp
> megaraid_sas crc32c_intel i2c_core pps_core [last unloaded: netconsole]
> [ 3974.212743] CPU: 2 PID: 8754 Comm: iperf Not tainted 4.11.0-rc2+ #30
> [ 3974.220073] Hardware name: Dell Inc. PowerEdge R730/0H21J3, BIOS
> 1.5.4 10/002/2015
> [ 3974.228974] Call Trace:
> [ 3974.231925]  <IRQ>
> [ 3974.234405]  dump_stack+0x63/0x8c
> [ 3974.238355]  __warn+0xd1/0xf0
> [ 3974.241891]  warn_slowpath_fmt+0x4f/0x60
> [ 3974.246494]  __list_del_entry_valid+0xa3/0xd0
> [ 3974.251583]  get_page_from_freelist+0x84c/0xb40
> [ 3974.256868]  ? napi_gro_receive+0x38/0x140
> [ 3974.261666]  __alloc_pages_nodemask+0xca/0x200
> [ 3974.266866]  mlx5e_alloc_rx_wqe+0x49/0x130 [mlx5_core]
> [ 3974.272862]  mlx5e_post_rx_wqes+0x84/0xc0 [mlx5_core]
> [ 3974.278725]  mlx5e_napi_poll+0xc7/0x450 [mlx5_core]
> [ 3974.284409]  net_rx_action+0x23d/0x3a0
> [ 3974.288819]  __do_softirq+0xd1/0x2a2
> [ 3974.293054]  irq_exit+0xb5/0xc0
> [ 3974.296783]  do_IRQ+0x51/0xd0
> [ 3974.300353]  common_interrupt+0x89/0x89
> [ 3974.304859] RIP: 0010:free_hot_cold_page+0x228/0x280
> [ 3974.310629] RSP: 0018:ffffc9000ea07c90 EFLAGS: 00000202 ORIG_RAX:
> ffffffffffffffa8
> [ 3974.319565] RAX: 0000000000000001 RBX: ffff88103f85f158 RCX:
> ffffea0040369c60
> [ 3974.327764] RDX: ffffea0040369c60 RSI: ffff88103f85f168 RDI:
> ffffea0040369ca0
> [ 3974.335961] RBP: ffffc9000ea07cc0 R08: ffff88103f85f168 R09:
> 00000000000005a8
> [ 3974.344178] R10: 00000000000005a8 R11: 0000000000010468 R12:
> ffffea0040369c80
> [ 3974.352387] R13: ffff88103f85f168 R14: ffff88107ffdeb80 R15:
> ffffea0040369ca0
> [ 3974.360577]  </IRQ>
> [ 3974.363145]  __put_page+0x34/0x40
> [ 3974.367068]  skb_release_data+0xca/0xe0
> [ 3974.371575]  skb_release_all+0x24/0x30
> [ 3974.375984]  __kfree_skb+0x12/0x20
> [ 3974.380003]  tcp_recvmsg+0x6ac/0xaf0
> [ 3974.384251]  inet_recvmsg+0x3c/0xa0
> [ 3974.388394]  sock_recvmsg+0x3d/0x50
> [ 3974.392511]  SYSC_recvfrom+0xd3/0x140
> [ 3974.396826]  ? handle_mm_fault+0xce/0x240
> [ 3974.401535]  ? SyS_futex+0x71/0x150
> [ 3974.405653]  SyS_recvfrom+0xe/0x10
> [ 3974.409673]  entry_SYSCALL_64_fastpath+0x1a/0xa9
> [ 3974.415056] RIP: 0033:0x7f04ca9315bb
> [ 3974.419309] RSP: 002b:00007f04c955de70 EFLAGS: 00000246 ORIG_RAX:
> 000000000000002d
> [ 3974.428243] RAX: ffffffffffffffda RBX: 0000000000020000 RCX:
> 00007f04ca9315bb
> [ 3974.436450] RDX: 0000000000020000 RSI: 00007f04bc0008f0 RDI:
> 0000000000000004
> [ 3974.444653] RBP: 0000000000000000 R08: 0000000000000000 R09:
> 0000000000000000
> [ 3974.452851] R10: 0000000000000000 R11: 0000000000000246 R12:
> 00007f04bc0008f0
> [ 3974.461051] R13: 0000000000034ac8 R14: 00007f04bc020910 R15:
> 000000000001c480
> [ 3974.469297] ---[ end trace 6fd472c9e1973d53 ]---
> 
> 
> >>
> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >> index 6cbde310abed..f82225725bc1 100644
> >> --- a/mm/page_alloc.c
> >> +++ b/mm/page_alloc.c
> >> @@ -2481,7 +2481,7 @@ void free_hot_cold_page(struct page *page, bool
> >> cold)
> >>      unsigned long pfn = page_to_pfn(page);
> >>      int migratetype;
> >>
> >> -    if (in_interrupt()) {
> >> +    if (in_irq()) {
> >>          __free_pages_ok(page, 0);
> >>          return;
> >>      }
> >> @@ -2647,7 +2647,7 @@ static struct page *__rmqueue_pcplist(struct
> >> zone *zone, int migratetype,
> >>  {
> >>      struct page *page;
> >>
> >> -    VM_BUG_ON(in_interrupt());
> >> +    VM_BUG_ON(in_irq());
> >>
> >>      do {
> >>          if (list_empty(list)) {
> >> @@ -2704,7 +2704,7 @@ struct page *rmqueue(struct zone *preferred_zone,
> >>      unsigned long flags;
> >>      struct page *page;
> >>
> >> -    if (likely(order == 0) && !in_interrupt()) {
> >> +    if (likely(order == 0) && !in_irq()) {
> >>          page = rmqueue_pcplist(preferred_zone, zone, order,
> >>                  gfp_flags, migratetype);
> >>          goto out;
> >>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-03-27  7:32 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-27 20:25 [merged] mm-page_alloc-only-use-per-cpu-allocator-for-irq-safe-requests.patch removed from -mm tree akpm
2017-03-01 13:48 ` Page allocator order-0 optimizations merged Jesper Dangaard Brouer
2017-03-01 17:36   ` Tariq Toukan
2017-03-01 17:36     ` Tariq Toukan
2017-03-22 17:39     ` Tariq Toukan
2017-03-22 17:39       ` Tariq Toukan
2017-03-22 23:40       ` Mel Gorman
2017-03-23 13:43         ` Jesper Dangaard Brouer
2017-03-23 14:51           ` Mel Gorman
2017-03-26  8:21             ` Tariq Toukan
2017-03-26 10:17               ` Tariq Toukan
2017-03-27  7:32                 ` Pankaj Gupta [this message]
2017-03-27  8:55                   ` Jesper Dangaard Brouer
2017-03-27 12:28                     ` Mel Gorman
2017-03-27 12:39                     ` Jesper Dangaard Brouer
2017-03-27 13:32                       ` Mel Gorman
2017-03-28  7:32                         ` Tariq Toukan
2017-03-28  8:29                           ` Jesper Dangaard Brouer
2017-03-28 16:05                           ` Tariq Toukan
2017-03-28 18:24                             ` Jesper Dangaard Brouer
2017-03-29  7:13                               ` Tariq Toukan
2017-03-28  8:28                         ` Pankaj Gupta
2017-03-27 14:15                       ` Matthew Wilcox
2017-03-27 14:15                         ` Matthew Wilcox
2017-03-27 15:15                         ` Jesper Dangaard Brouer
2017-03-27 16:58                           ` in_irq_or_nmi() Matthew Wilcox
2017-03-27 16:58                             ` in_irq_or_nmi() Matthew Wilcox
2017-03-27 16:58                             ` in_irq_or_nmi() Matthew Wilcox
2017-03-29  8:12                             ` in_irq_or_nmi() Peter Zijlstra
2017-03-29  8:12                               ` in_irq_or_nmi() Peter Zijlstra
2017-03-29  8:12                               ` in_irq_or_nmi() Peter Zijlstra
2017-03-29  8:59                               ` in_irq_or_nmi() Jesper Dangaard Brouer
2017-03-29  8:59                                 ` in_irq_or_nmi() Jesper Dangaard Brouer
2017-03-29  9:19                                 ` in_irq_or_nmi() Peter Zijlstra
2017-03-29  9:19                                   ` in_irq_or_nmi() Peter Zijlstra
2017-03-29  9:19                                   ` in_irq_or_nmi() Peter Zijlstra
2017-03-29 18:12                                   ` in_irq_or_nmi() Matthew Wilcox
2017-03-29 18:12                                     ` in_irq_or_nmi() Matthew Wilcox
2017-03-29 19:11                                     ` in_irq_or_nmi() Jesper Dangaard Brouer
2017-03-29 19:11                                       ` in_irq_or_nmi() Jesper Dangaard Brouer
2017-03-29 19:44                                       ` in_irq_or_nmi() and RFC patch Jesper Dangaard Brouer
2017-03-29 19:44                                         ` Jesper Dangaard Brouer
2017-03-30  6:49                                         ` Peter Zijlstra
2017-03-30  6:49                                           ` Peter Zijlstra
2017-03-30  7:12                                           ` Jesper Dangaard Brouer
2017-03-30  7:12                                             ` Jesper Dangaard Brouer
2017-03-30  7:35                                             ` Peter Zijlstra
2017-03-30  7:35                                               ` Peter Zijlstra
2017-03-30  9:46                                               ` Jesper Dangaard Brouer
2017-03-30  9:46                                                 ` Jesper Dangaard Brouer
2017-03-30 13:04                                         ` Mel Gorman
2017-03-30 13:04                                           ` Mel Gorman
2017-03-30 15:07                                           ` Jesper Dangaard Brouer
2017-03-30 15:07                                             ` Jesper Dangaard Brouer
2017-04-03 12:05                                             ` Mel Gorman
2017-04-03 12:05                                               ` Mel Gorman
2017-04-05  8:53                                               ` Mel Gorman
2017-04-05  8:53                                                 ` Mel Gorman
2017-04-10 14:31   ` Page allocator order-0 optimizations merged zhong jiang
2017-04-10 14:31     ` zhong jiang
2017-04-10 15:10     ` Mel Gorman
2017-04-11  1:54       ` zhong jiang
2017-04-11  1:54         ` zhong jiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2123321554.7161128.1490599967015.JavaMail.zimbra@redhat.com \
    --to=pagupta@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=brouer@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@mellanox.com \
    --cc=tariqt@mellanox.com \
    --cc=ttoukan.linux@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.