All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ozgun Erdogan <ozgun@citusdata.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	linux-mm@kvack.org, Andi Kleen <andi@firstfloor.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Greg Thelen <gthelen@google.com>,
	Christoph Hellwig <hch@infradead.org>,
	Hugh Dickins <hughd@google.com>, Jan Kara <jack@suse.cz>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Mel Gorman <mgorman@suse.de>, Minchan Kim <minchan.kim@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Rik van Riel <riel@redhat.com>,
	Michel Lespinasse <walken@google.com>,
	Seth Jennings <sjenning@linux.vnet.ibm.com>,
	Roman Gushchin <klamm@yandex-team.ru>,
	Metin Doslu <metin@citusdata.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [patch 0/9] mm: thrash detection-based file cache sizing v3
Date: Fri, 9 Aug 2013 18:39:20 -0700	[thread overview]
Message-ID: <CAAxz3Xsn_m5CxudayR+ChTZhS04rGChK+9QM2SWwt1vV_1aDdA@mail.gmail.com> (raw)
In-Reply-To: <20130809155309.71d93380425ef8e19c0ff44c@linux-foundation.org>

[-- Attachment #1: Type: text/plain, Size: 1835 bytes --]

Hi Andrew,

One common use case where this is really helpful is in data analytics.
Assume that you regularly analyze some chunk of data, say one month's
worth, and you run SQL queries or MapReduce jobs on this data. Let's also
assume you want to serve the current month's data from memory.

Going with an example, let's say data for March takes 60% of total memory.
You run queries over that data, and it gets pulled into the active list.
Comes next month, you want to query April's data (which again holds 60% of
memory). Since analytic queries sequentially walk over data, April's data
never becomes active, doesn't get pulled into memory, and you're stuck with
serving queries from disk.

To overcome this issue, you could regularly drop the page cache, or advise
customers to provision clusters whose cumulative memory is 2x the working
set. Neither are that ideal. My understanding is that this patch resolves
this issue, but then again my knowledge of the Linux memory manager is
pretty limited. So please call off if I'm off here.

Thanks,
Ozgun


On Fri, Aug 9, 2013 at 3:53 PM, Andrew Morton <akpm@linux-foundation.org>wrote:

> On Tue,  6 Aug 2013 18:44:01 -0400 Johannes Weiner <hannes@cmpxchg.org>
> wrote:
>
> > This series solves the problem by maintaining a history of pages
> > evicted from the inactive list, enabling the VM to tell streaming IO
> > from thrashing and rebalance the page cache lists when appropriate.
>
> Looks nice. The lack of testing results is conspicuous ;)
>
> It only really solves the problem in the case where
>
>         size-of-inactive-list < size-of-working-set < size-of-total-memory
>
> yes?  In fact less than that, because the active list presumably
> doesn't get shrunk to zero (how far *can* it go?).  I wonder how many
> workloads fit into those constraints in the real world.
>
>

[-- Attachment #2: Type: text/html, Size: 2408 bytes --]

  reply	other threads:[~2013-08-10  1:39 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-06 22:44 [patch 0/9] mm: thrash detection-based file cache sizing v3 Johannes Weiner
2013-08-06 22:44 ` Johannes Weiner
2013-08-06 22:44 ` [patch 1/9] lib: radix-tree: radix_tree_delete_item() Johannes Weiner
2013-08-06 22:44   ` Johannes Weiner
2013-08-06 22:44 ` [patch 2/9] mm: shmem: save one radix tree lookup when truncating swapped pages Johannes Weiner
2013-08-06 22:44   ` Johannes Weiner
2013-08-06 22:44 ` [patch 3/9] mm: filemap: move radix tree hole searching here Johannes Weiner
2013-08-06 22:44   ` Johannes Weiner
2013-08-06 22:44 ` [patch 4/9] mm + fs: prepare for non-page entries in page cache radix trees Johannes Weiner
2013-08-06 22:44   ` Johannes Weiner
2013-08-06 22:44 ` [patch 5/9] mm + fs: store shadow entries in page cache Johannes Weiner
2013-08-06 22:44   ` Johannes Weiner
2013-08-06 22:44 ` [patch 6/9] mm + fs: provide shadow pages to page cache allocations Johannes Weiner
2013-08-06 22:44   ` Johannes Weiner
2013-08-06 22:44 ` [patch 7/9] mm: make global_dirtyable_memory() available to other mm code Johannes Weiner
2013-08-06 22:44   ` Johannes Weiner
2013-08-06 22:44 ` [patch 8/9] mm: thrash detection-based file cache sizing Johannes Weiner
2013-08-06 22:44   ` Johannes Weiner
2013-08-09 22:49   ` Andrew Morton
2013-08-09 22:49     ` Andrew Morton
2013-08-12 16:00     ` Johannes Weiner
2013-08-12 16:00       ` Johannes Weiner
2013-08-11 21:57   ` Vlastimil Babka
2013-08-11 21:57     ` Vlastimil Babka
2013-08-12 16:27     ` Johannes Weiner
2013-08-12 16:27       ` Johannes Weiner
2013-08-06 22:44 ` [patch 9/9] mm: workingset: keep shadow entries in check Johannes Weiner
2013-08-06 22:44   ` Johannes Weiner
2013-08-11 23:56   ` Andi Kleen
2013-08-11 23:56     ` Andi Kleen
2013-08-14 14:41     ` Johannes Weiner
2013-08-14 14:41       ` Johannes Weiner
2013-08-09 22:53 ` [patch 0/9] mm: thrash detection-based file cache sizing v3 Andrew Morton
2013-08-09 22:53   ` Andrew Morton
2013-08-10  1:39   ` Ozgun Erdogan [this message]
2013-08-12 22:15   ` Johannes Weiner
2013-08-12 22:15     ` Johannes Weiner
  -- strict thread matches above, loose matches on Subject: below --
2013-08-06 22:44 Johannes Weiner
2013-08-06 22:44 Johannes Weiner
2013-08-06 22:22 Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAAxz3Xsn_m5CxudayR+ChTZhS04rGChK+9QM2SWwt1vV_1aDdA@mail.gmail.com \
    --to=ozgun@citusdata.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=klamm@yandex-team.ru \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=metin@citusdata.com \
    --cc=mgorman@suse.de \
    --cc=minchan.kim@gmail.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=sjenning@linux.vnet.ibm.com \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.