From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751528AbcFPEXq (ORCPT ); Thu, 16 Jun 2016 00:23:46 -0400 Received: from mail-pf0-f196.google.com ([209.85.192.196]:36025 "EHLO mail-pf0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751066AbcFPEXo (ORCPT ); Thu, 16 Jun 2016 00:23:44 -0400 Date: Thu, 16 Jun 2016 13:23:43 +0900 From: Sergey Senozhatsky To: Minchan Kim Cc: Sergey Senozhatsky , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Vlastimil Babka , dri-devel@lists.freedesktop.org, Hugh Dickins , John Einar Reitan , Jonathan Corbet , Joonsoo Kim , Konstantin Khlebnikov , Mel Gorman , Naoya Horiguchi , Rafael Aquini , Rik van Riel , Sergey Senozhatsky , virtualization@lists.linux-foundation.org, Gioh Kim , Chan Gyun Jeong , Sangseok Lee , Kyeongdon Kim , Chulmin Kim Subject: Re: [PATCH v7 00/12] Support non-lru page migration Message-ID: <20160616042343.GA516@swordfish> References: <1464736881-24886-1-git-send-email-minchan@kernel.org> <20160615075909.GA425@swordfish> <20160615231248.GI17127@bbox> <20160616024827.GA497@swordfish> <20160616025800.GO17127@bbox> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160616025800.GO17127@bbox> User-Agent: Mutt/1.6.1 (2016-04-27) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On (06/16/16 11:58), Minchan Kim wrote: [..] > RAX: 2065676162726166 so rax is totally garbage, I think. > It means obj_to_head returns garbage because get_first_obj_offset is > utter crab because (page_idx / class->pages_per_zspage) was totally > wrong. > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > 6408: f0 0f ba 28 00 lock btsl $0x0,(%rax) > > > > > > Could you test with [zsmalloc: keep first object offset in struct page] > > > in mmotm? > > > > sure, I can. will it help, tho? we have a race condition here I think. > > I guess root cause is caused by get_first_obj_offset. sounds reasonable. > Please test with it. this is what I'm getting with the [zsmalloc: keep first object offset in struct page] applied: "count:0 mapcount:-127". which may be not related to zsmalloc at this point. kernel: BUG: Bad page state in process khugepaged pfn:101db8 kernel: page:ffffea0004076e00 count:0 mapcount:-127 mapping: (null) index:0x1 kernel: flags: 0x8000000000000000() kernel: page dumped because: nonzero mapcount kernel: Modules linked in: lzo zram zsmalloc mousedev coretemp hwmon crc32c_intel snd_hda_codec_realtek i2c_i801 snd_hda_codec_generic r8169 mii snd_hda_intel snd_hda_codec snd_hda_core acpi_cpufreq snd_pcm snd_timer snd soundcore lpc_ich processor mfd_core sch_fq_codel sd_mod hid_generic usb kernel: CPU: 3 PID: 38 Comm: khugepaged Not tainted 4.7.0-rc3-next-20160615-dbg-00005-gfd11984-dirty #491 kernel: 0000000000000000 ffff8801124c73f8 ffffffff814d69b0 ffffea0004076e00 kernel: ffffffff81e658a0 ffff8801124c7420 ffffffff811e9b63 0000000000000000 kernel: ffffea0004076e00 ffffffff81e658a0 ffff8801124c7440 ffffffff811e9ca9 kernel: Call Trace: kernel: [] dump_stack+0x68/0x92 kernel: [] bad_page+0x158/0x1a2 kernel: [] free_pages_check_bad+0xfc/0x101 kernel: [] free_hot_cold_page+0x135/0x5de kernel: [] __free_pages+0x67/0x72 kernel: [] release_freepages+0x13a/0x191 kernel: [] compact_zone+0x845/0x1155 kernel: [] ? compaction_suitable+0x76/0x76 kernel: [] compact_zone_order+0xe0/0x167 kernel: [] ? compact_zone+0x1155/0x1155 kernel: [] try_to_compact_pages+0x2f1/0x648 kernel: [] ? try_to_compact_pages+0x2f1/0x648 kernel: [] ? compaction_zonelist_suitable+0x3a6/0x3a6 kernel: [] ? get_page_from_freelist+0x2c0/0x133c kernel: [] __alloc_pages_direct_compact+0xea/0x30d kernel: [] ? get_page_from_freelist+0x133c/0x133c kernel: [] ? drain_all_pages+0x1d6/0x205 kernel: [] __alloc_pages_nodemask+0x143d/0x16b6 kernel: [] ? debug_show_all_locks+0x226/0x226 kernel: [] ? warn_alloc_failed+0x24c/0x24c kernel: [] ? finish_wait+0x1a4/0x1b0 kernel: [] ? lock_acquire+0xec/0x147 kernel: [] ? _raw_spin_unlock_irqrestore+0x3b/0x5c kernel: [] ? _raw_spin_unlock_irqrestore+0x47/0x5c kernel: [] ? finish_wait+0x1a4/0x1b0 kernel: [] khugepaged+0x1d4/0x484f kernel: [] ? hugepage_vma_revalidate+0xef/0xef kernel: [] ? finish_task_switch+0x3de/0x484 kernel: [] ? _raw_spin_unlock_irq+0x27/0x45 kernel: [] ? trace_hardirqs_on_caller+0x3d2/0x492 kernel: [] ? prepare_to_wait_event+0x3f7/0x3f7 kernel: [] ? __schedule+0xa4d/0xd16 kernel: [] kthread+0x252/0x261 kernel: [] ? hugepage_vma_revalidate+0xef/0xef kernel: [] ? kthread_create_on_node+0x377/0x377 kernel: [] ret_from_fork+0x1f/0x40 kernel: [] ? kthread_create_on_node+0x377/0x377 -- Reboot -- -ss From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f70.google.com (mail-pa0-f70.google.com [209.85.220.70]) by kanga.kvack.org (Postfix) with ESMTP id 987E06B0005 for ; Thu, 16 Jun 2016 00:23:45 -0400 (EDT) Received: by mail-pa0-f70.google.com with SMTP id he1so66136758pac.0 for ; Wed, 15 Jun 2016 21:23:45 -0700 (PDT) Received: from mail-pf0-x243.google.com (mail-pf0-x243.google.com. [2607:f8b0:400e:c00::243]) by mx.google.com with ESMTPS id w11si3491413pfj.166.2016.06.15.21.23.43 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 15 Jun 2016 21:23:44 -0700 (PDT) Received: by mail-pf0-x243.google.com with SMTP id t190so3094665pfb.2 for ; Wed, 15 Jun 2016 21:23:43 -0700 (PDT) Date: Thu, 16 Jun 2016 13:23:43 +0900 From: Sergey Senozhatsky Subject: Re: [PATCH v7 00/12] Support non-lru page migration Message-ID: <20160616042343.GA516@swordfish> References: <1464736881-24886-1-git-send-email-minchan@kernel.org> <20160615075909.GA425@swordfish> <20160615231248.GI17127@bbox> <20160616024827.GA497@swordfish> <20160616025800.GO17127@bbox> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160616025800.GO17127@bbox> Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim Cc: Sergey Senozhatsky , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Vlastimil Babka , dri-devel@lists.freedesktop.org, Hugh Dickins , John Einar Reitan , Jonathan Corbet , Joonsoo Kim , Konstantin Khlebnikov , Mel Gorman , Naoya Horiguchi , Rafael Aquini , Rik van Riel , Sergey Senozhatsky , virtualization@lists.linux-foundation.org, Gioh Kim , Chan Gyun Jeong , Sangseok Lee , Kyeongdon Kim , Chulmin Kim On (06/16/16 11:58), Minchan Kim wrote: [..] > RAX: 2065676162726166 so rax is totally garbage, I think. > It means obj_to_head returns garbage because get_first_obj_offset is > utter crab because (page_idx / class->pages_per_zspage) was totally > wrong. > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > 6408: f0 0f ba 28 00 lock btsl $0x0,(%rax) > > > > > > Could you test with [zsmalloc: keep first object offset in struct page] > > > in mmotm? > > > > sure, I can. will it help, tho? we have a race condition here I think. > > I guess root cause is caused by get_first_obj_offset. sounds reasonable. > Please test with it. this is what I'm getting with the [zsmalloc: keep first object offset in struct page] applied: "count:0 mapcount:-127". which may be not related to zsmalloc at this point. kernel: BUG: Bad page state in process khugepaged pfn:101db8 kernel: page:ffffea0004076e00 count:0 mapcount:-127 mapping: (null) index:0x1 kernel: flags: 0x8000000000000000() kernel: page dumped because: nonzero mapcount kernel: Modules linked in: lzo zram zsmalloc mousedev coretemp hwmon crc32c_intel snd_hda_codec_realtek i2c_i801 snd_hda_codec_generic r8169 mii snd_hda_intel snd_hda_codec snd_hda_core acpi_cpufreq snd_pcm snd_timer snd soundcore lpc_ich processor mfd_core sch_fq_codel sd_mod hid_generic usb kernel: CPU: 3 PID: 38 Comm: khugepaged Not tainted 4.7.0-rc3-next-20160615-dbg-00005-gfd11984-dirty #491 kernel: 0000000000000000 ffff8801124c73f8 ffffffff814d69b0 ffffea0004076e00 kernel: ffffffff81e658a0 ffff8801124c7420 ffffffff811e9b63 0000000000000000 kernel: ffffea0004076e00 ffffffff81e658a0 ffff8801124c7440 ffffffff811e9ca9 kernel: Call Trace: kernel: [] dump_stack+0x68/0x92 kernel: [] bad_page+0x158/0x1a2 kernel: [] free_pages_check_bad+0xfc/0x101 kernel: [] free_hot_cold_page+0x135/0x5de kernel: [] __free_pages+0x67/0x72 kernel: [] release_freepages+0x13a/0x191 kernel: [] compact_zone+0x845/0x1155 kernel: [] ? compaction_suitable+0x76/0x76 kernel: [] compact_zone_order+0xe0/0x167 kernel: [] ? compact_zone+0x1155/0x1155 kernel: [] try_to_compact_pages+0x2f1/0x648 kernel: [] ? try_to_compact_pages+0x2f1/0x648 kernel: [] ? compaction_zonelist_suitable+0x3a6/0x3a6 kernel: [] ? get_page_from_freelist+0x2c0/0x133c kernel: [] __alloc_pages_direct_compact+0xea/0x30d kernel: [] ? get_page_from_freelist+0x133c/0x133c kernel: [] ? drain_all_pages+0x1d6/0x205 kernel: [] __alloc_pages_nodemask+0x143d/0x16b6 kernel: [] ? debug_show_all_locks+0x226/0x226 kernel: [] ? warn_alloc_failed+0x24c/0x24c kernel: [] ? finish_wait+0x1a4/0x1b0 kernel: [] ? lock_acquire+0xec/0x147 kernel: [] ? _raw_spin_unlock_irqrestore+0x3b/0x5c kernel: [] ? _raw_spin_unlock_irqrestore+0x47/0x5c kernel: [] ? finish_wait+0x1a4/0x1b0 kernel: [] khugepaged+0x1d4/0x484f kernel: [] ? hugepage_vma_revalidate+0xef/0xef kernel: [] ? finish_task_switch+0x3de/0x484 kernel: [] ? _raw_spin_unlock_irq+0x27/0x45 kernel: [] ? trace_hardirqs_on_caller+0x3d2/0x492 kernel: [] ? prepare_to_wait_event+0x3f7/0x3f7 kernel: [] ? __schedule+0xa4d/0xd16 kernel: [] kthread+0x252/0x261 kernel: [] ? hugepage_vma_revalidate+0xef/0xef kernel: [] ? kthread_create_on_node+0x377/0x377 kernel: [] ret_from_fork+0x1f/0x40 kernel: [] ? kthread_create_on_node+0x377/0x377 -- Reboot -- -ss -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org