All of lore.kernel.org
 help / color / mirror / Atom feed
From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
To: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Fengguang Wu <fengguang.wu@intel.com>,
	David Cohen <david.a.cohen@linux.intel.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Damien Ramonda <damien.ramonda@intel.com>,
	Jan Kara <jack@suse.cz>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH V5] mm readahead: Fix readahead fail for no local memory and limit readahead pages
Date: Mon, 10 Feb 2014 17:55:58 +0530	[thread overview]
Message-ID: <52F8C556.6090006@linux.vnet.ibm.com> (raw)
In-Reply-To: <alpine.DEB.2.02.1402100200420.30650@chino.kir.corp.google.com>

On 02/10/2014 03:35 PM, David Rientjes wrote:
> On Mon, 10 Feb 2014, Raghavendra K T wrote:
>
>> As you rightly pointed , I 'll drop remote memory term and use
>> something like  :
>>
>> "* Ensure readahead success on a memoryless node cpu. But we limit
>>   * the readahead to 4k pages to avoid trashing page cache." ..
>>
>
> I don't know how to proceed here after pointing it out twice, I'm afraid.
>
> numa_mem_id() is local memory for a memoryless node.  node_present_pages()
> has no place in your patch.

Hi David,  I am happy to see your pointer reg. numa_mem_id(). I did not
meant to be ignoring/offensive .. sorry if conversation thought to be so.

So I understood that you are suggesting implementations like below

1) I do not have problem with the below approach, I could post this in
next version.
( But this did not include 4k limit Linus mentioned to apply)

unsigned long max_sane_readahead(unsigned long nr)
{
         unsigned long local_free_page;
         int nid;

         nid = numa_mem_id();

         /*
          * We sanitize readahead size depending on free memory in
          * the local node.
          */
         local_free_page = node_page_state(nid, NR_INACTIVE_FILE)
                           + node_page_state(nid, NR_FREE_PAGES);
         return min(nr, local_free_page / 2);
}

2) I did not go for below because Honza (Jan Kara) had some
concerns for 4k limit for normal case, and since I am not
the expert, I was waiting for opinions.

unsigned long max_sane_readahead(unsigned long nr)
{
         unsigned long local_free_page, sane_nr;
         int nid;

         nid = numa_mem_id();
	/* limit the max readahead to 4k pages */
	sane_nr = min(nr, MAX_REMOTE_READAHEAD);

         /*
          * We sanitize readahead size depending on free memory in
          * the local node.
          */
         local_free_page = node_page_state(nid, NR_INACTIVE_FILE)
                           + node_page_state(nid, NR_FREE_PAGES);
         return min(sane_nr, local_free_page / 2);
}

>
>> Regarding ACCESS_ONCE, since we will have to add
>> inside the function and still there is nothing that could prevent us
>> getting run on different cpu with a different node (as Andrew ponted), I have
>> not included in current patch that I am posting.
>> Moreover this case is hopefully not fatal since it is just a hint for
>> readahead we can do.
>>
>
> I have no idea why you think the ACCESS_ONCE() is a problem.  It's relying
> on gcc's implementation to ensure that the equation is done only for one
> node.  It has absolutely nothing to do with the fact that the process may
> be moved to another cpu upon returning or even immediately after the
> calculation is done.  Is it possible that node0 has 80% of memory free and
> node1 has 80% of memory inactive?  Well, then your equation doesn't work
> quite so well if the process moves.
>
> There is no downside whatsoever to using it, I have no idea why you think
> it's better without it.

I have no problem introducing ACESSS_ONCE too. But I skipped only
after I got the below error.

mm/readahead.c: In function ‘max_sane_readahead’:
mm/readahead.c:246: error: lvalue required as unary ‘&’ operand

>
>> So there are many possible implementation:
>> (1) use numa_mem_id(), apply freepage limit  and use 4k page limit for all
>> case
>> (Jan had reservation about this case)
>>
>> (2)for normal case:    use free memory calculation and do not apply 4k
>>      limit (no change).
>>     for memoryless cpu case:  use numa_mem_id for more accurate
>>      calculation of limit and also apply 4k limit.
>>
>> (3) for normal case:   use free memory calculation and do not apply 4k
>>      limit (no change).
>>      for memoryless case: apply 4k page limit
>>
>> (4) use numa_mem_id() and apply only free page limit..
>>
>> So, I ll be resending the patch with changelog and comment changes
>> based on your and Andrew's feedback (type (3) implementation).
>>
>
> It's frustrating to have to say something three times.  Ask yourself what
> happens if ALL NODES WITH CPUS DO NOT HAVE MEMORY?
>

True, this is the reason why we could go for implementation (1) I posted
above. It was just that I did not want to float a new version without
knowing whether Andrew was expecting new patch or change log updates.


WARNING: multiple messages have this Message-ID (diff)
From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
To: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Fengguang Wu <fengguang.wu@intel.com>,
	David Cohen <david.a.cohen@linux.intel.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Damien Ramonda <damien.ramonda@intel.com>,
	Jan Kara <jack@suse.cz>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH V5] mm readahead: Fix readahead fail for no local memory and limit readahead pages
Date: Mon, 10 Feb 2014 17:55:58 +0530	[thread overview]
Message-ID: <52F8C556.6090006@linux.vnet.ibm.com> (raw)
In-Reply-To: <alpine.DEB.2.02.1402100200420.30650@chino.kir.corp.google.com>

On 02/10/2014 03:35 PM, David Rientjes wrote:
> On Mon, 10 Feb 2014, Raghavendra K T wrote:
>
>> As you rightly pointed , I 'll drop remote memory term and use
>> something like  :
>>
>> "* Ensure readahead success on a memoryless node cpu. But we limit
>>   * the readahead to 4k pages to avoid trashing page cache." ..
>>
>
> I don't know how to proceed here after pointing it out twice, I'm afraid.
>
> numa_mem_id() is local memory for a memoryless node.  node_present_pages()
> has no place in your patch.

Hi David,  I am happy to see your pointer reg. numa_mem_id(). I did not
meant to be ignoring/offensive .. sorry if conversation thought to be so.

So I understood that you are suggesting implementations like below

1) I do not have problem with the below approach, I could post this in
next version.
( But this did not include 4k limit Linus mentioned to apply)

unsigned long max_sane_readahead(unsigned long nr)
{
         unsigned long local_free_page;
         int nid;

         nid = numa_mem_id();

         /*
          * We sanitize readahead size depending on free memory in
          * the local node.
          */
         local_free_page = node_page_state(nid, NR_INACTIVE_FILE)
                           + node_page_state(nid, NR_FREE_PAGES);
         return min(nr, local_free_page / 2);
}

2) I did not go for below because Honza (Jan Kara) had some
concerns for 4k limit for normal case, and since I am not
the expert, I was waiting for opinions.

unsigned long max_sane_readahead(unsigned long nr)
{
         unsigned long local_free_page, sane_nr;
         int nid;

         nid = numa_mem_id();
	/* limit the max readahead to 4k pages */
	sane_nr = min(nr, MAX_REMOTE_READAHEAD);

         /*
          * We sanitize readahead size depending on free memory in
          * the local node.
          */
         local_free_page = node_page_state(nid, NR_INACTIVE_FILE)
                           + node_page_state(nid, NR_FREE_PAGES);
         return min(sane_nr, local_free_page / 2);
}

>
>> Regarding ACCESS_ONCE, since we will have to add
>> inside the function and still there is nothing that could prevent us
>> getting run on different cpu with a different node (as Andrew ponted), I have
>> not included in current patch that I am posting.
>> Moreover this case is hopefully not fatal since it is just a hint for
>> readahead we can do.
>>
>
> I have no idea why you think the ACCESS_ONCE() is a problem.  It's relying
> on gcc's implementation to ensure that the equation is done only for one
> node.  It has absolutely nothing to do with the fact that the process may
> be moved to another cpu upon returning or even immediately after the
> calculation is done.  Is it possible that node0 has 80% of memory free and
> node1 has 80% of memory inactive?  Well, then your equation doesn't work
> quite so well if the process moves.
>
> There is no downside whatsoever to using it, I have no idea why you think
> it's better without it.

I have no problem introducing ACESSS_ONCE too. But I skipped only
after I got the below error.

mm/readahead.c: In function ?max_sane_readahead?:
mm/readahead.c:246: error: lvalue required as unary ?&? operand

>
>> So there are many possible implementation:
>> (1) use numa_mem_id(), apply freepage limit  and use 4k page limit for all
>> case
>> (Jan had reservation about this case)
>>
>> (2)for normal case:    use free memory calculation and do not apply 4k
>>      limit (no change).
>>     for memoryless cpu case:  use numa_mem_id for more accurate
>>      calculation of limit and also apply 4k limit.
>>
>> (3) for normal case:   use free memory calculation and do not apply 4k
>>      limit (no change).
>>      for memoryless case: apply 4k page limit
>>
>> (4) use numa_mem_id() and apply only free page limit..
>>
>> So, I ll be resending the patch with changelog and comment changes
>> based on your and Andrew's feedback (type (3) implementation).
>>
>
> It's frustrating to have to say something three times.  Ask yourself what
> happens if ALL NODES WITH CPUS DO NOT HAVE MEMORY?
>

True, this is the reason why we could go for implementation (1) I posted
above. It was just that I did not want to float a new version without
knowing whether Andrew was expecting new patch or change log updates.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2014-02-10 12:19 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-22 10:53 [RFC PATCH V5] mm readahead: Fix readahead fail for no local memory and limit readahead pages Raghavendra K T
2014-01-22 10:53 ` Raghavendra K T
2014-02-03  8:30 ` Raghavendra K T
2014-02-03  8:30   ` Raghavendra K T
2014-02-06 22:51 ` Andrew Morton
2014-02-06 22:51   ` Andrew Morton
2014-02-06 22:58   ` David Rientjes
2014-02-06 22:58     ` David Rientjes
2014-02-06 23:22     ` Andrew Morton
2014-02-06 23:22       ` Andrew Morton
2014-02-06 23:48       ` David Rientjes
2014-02-06 23:48         ` David Rientjes
2014-02-06 23:58         ` David Rientjes
2014-02-06 23:58           ` David Rientjes
2014-02-07 10:42           ` Raghavendra K T
2014-02-07 10:42             ` Raghavendra K T
2014-02-07 20:41             ` David Rientjes
2014-02-07 20:41               ` David Rientjes
2014-02-10  8:21               ` Raghavendra K T
2014-02-10  8:21                 ` Raghavendra K T
2014-02-10 10:05                 ` David Rientjes
2014-02-10 10:05                   ` David Rientjes
2014-02-10 12:25                   ` Raghavendra K T [this message]
2014-02-10 12:25                     ` Raghavendra K T
2014-02-10 21:35                     ` David Rientjes
2014-02-10 21:35                       ` David Rientjes
2014-02-13  7:07                       ` Raghavendra K T
2014-02-13  7:07                         ` Raghavendra K T
2014-02-13  8:05                         ` David Rientjes
2014-02-13  8:05                           ` David Rientjes
2014-02-13 10:04                           ` Raghavendra K T
2014-02-13 10:04                             ` Raghavendra K T
2014-02-13 22:41                             ` David Rientjes
2014-02-13 22:41                               ` David Rientjes
2014-02-14  0:14                               ` Nishanth Aravamudan
2014-02-14  0:14                                 ` Nishanth Aravamudan
2014-02-14  0:37                                 ` Linus Torvalds
2014-02-14  0:37                                   ` Linus Torvalds
2014-02-14  0:45                                   ` Andrew Morton
2014-02-14  0:45                                     ` Andrew Morton
2014-02-14  4:32                                   ` Nishanth Aravamudan
2014-02-14  4:32                                     ` Nishanth Aravamudan
2014-02-14 10:54                                     ` David Rientjes
2014-02-14 10:54                                       ` David Rientjes
2014-02-17 19:28                                       ` Nishanth Aravamudan
2014-02-17 19:28                                         ` Nishanth Aravamudan
2014-02-17 23:14                                         ` David Rientjes
2014-02-17 23:14                                           ` David Rientjes
2014-02-18  1:31                                           ` Nishanth Aravamudan
2014-02-18  1:31                                             ` Nishanth Aravamudan
2014-02-17 22:59                                     ` Linus Torvalds
2014-02-17 22:59                                       ` Linus Torvalds
2014-02-14  7:43                                   ` Jan Kara
2014-02-14  7:43                                     ` Jan Kara
2014-02-17 22:57                                     ` Linus Torvalds
2014-02-17 22:57                                       ` Linus Torvalds
2014-02-14  5:47                               ` Nishanth Aravamudan
2014-02-14  5:47                                 ` Nishanth Aravamudan
2014-02-13 21:06                           ` Andrew Morton
2014-02-13 21:06                             ` Andrew Morton
2014-02-13 21:42                             ` Nishanth Aravamudan
2014-02-13 21:42                               ` Nishanth Aravamudan
2014-02-10  8:29   ` [RFC PATCH V5 RESEND] " Raghavendra K T
2014-02-10  8:29     ` Raghavendra K T

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52F8C556.6090006@linux.vnet.ibm.com \
    --to=raghavendra.kt@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=damien.ramonda@intel.com \
    --cc=david.a.cohen@linux.intel.com \
    --cc=fengguang.wu@intel.com \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rientjes@google.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.