All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ryan Roberts <ryan.roberts@arm.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>,
	Matthew Wilcox <willy@infradead.org>,
	Gao Xiang <xiang@kernel.org>, Yu Zhao <yuzhao@googleN.com>,
	Yang Shi <shy828301@gmail.com>, Michal Hocko <mhocko@suse.com>,
	Kefeng Wang <wangkefeng.wang@huawei.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v3 4/4] mm: swap: Swap-out small-sized THP without splitting
Date: Fri, 3 Nov 2023 11:42:26 +0000	[thread overview]
Message-ID: <1bec0057-8d8c-433f-aee6-e6b2f781aa54@arm.com> (raw)
In-Reply-To: <87pm0v8b32.fsf@yhuang6-desk2.ccr.corp.intel.com>

On 31/10/2023 08:12, Huang, Ying wrote:
> Ryan Roberts <ryan.roberts@arm.com> writes:
> 
>> On 30/10/2023 08:18, Huang, Ying wrote:
>>> Hi, Ryan,
>>>
>>> Ryan Roberts <ryan.roberts@arm.com> writes:
>>>
>>>> The upcoming anonymous small-sized THP feature enables performance
>>>> improvements by allocating large folios for anonymous memory. However
>>>> I've observed that on an arm64 system running a parallel workload (e.g.
>>>> kernel compilation) across many cores, under high memory pressure, the
>>>> speed regresses. This is due to bottlenecking on the increased number of
>>>> TLBIs added due to all the extra folio splitting.
>>>>
>>>> Therefore, solve this regression by adding support for swapping out
>>>> small-sized THP without needing to split the folio, just like is already
>>>> done for PMD-sized THP. This change only applies when CONFIG_THP_SWAP is
>>>> enabled, and when the swap backing store is a non-rotating block device.
>>>> These are the same constraints as for the existing PMD-sized THP
>>>> swap-out support.
>>>>
>>>> Note that no attempt is made to swap-in THP here - this is still done
>>>> page-by-page, like for PMD-sized THP.
>>>>
>>>> The main change here is to improve the swap entry allocator so that it
>>>> can allocate any power-of-2 number of contiguous entries between [1, (1
>>>> << PMD_ORDER)]. This is done by allocating a cluster for each distinct
>>>> order and allocating sequentially from it until the cluster is full.
>>>> This ensures that we don't need to search the map and we get no
>>>> fragmentation due to alignment padding for different orders in the
>>>> cluster. If there is no current cluster for a given order, we attempt to
>>>> allocate a free cluster from the list. If there are no free clusters, we
>>>> fail the allocation and the caller falls back to splitting the folio and
>>>> allocates individual entries (as per existing PMD-sized THP fallback).
>>>>
>>>> The per-order current clusters are maintained per-cpu using the existing
>>>> infrastructure. This is done to avoid interleving pages from different
>>>> tasks, which would prevent IO being batched. This is already done for
>>>> the order-0 allocations so we follow the same pattern.
>>>> __scan_swap_map_try_ssd_cluster() is introduced to deal with arbitrary
>>>> orders and scan_swap_map_try_ssd_cluster() is refactored as a wrapper
>>>> for order-0.
>>>>
>>>> As is done for order-0 per-cpu clusters, the scanner now can steal
>>>> order-0 entries from any per-cpu-per-order reserved cluster. This
>>>> ensures that when the swap file is getting full, space doesn't get tied
>>>> up in the per-cpu reserves.
>>>>
>>>> I've run the tests on Ampere Altra (arm64), set up with a 35G block ram
>>>> device as the swap device and from inside a memcg limited to 40G memory.
>>>> I've then run `usemem` from vm-scalability with 70 processes (each has
>>>> its own core), each allocating and writing 1G of memory. I've repeated
>>>> everything 5 times and taken the mean:
>>>>
>>>> Mean Performance Improvement vs 4K/baseline
>>>>
>>>> | alloc size |            baseline |       + this series |
>>>> |            |  v6.6-rc4+anonfolio |                     |
>>>> |:-----------|--------------------:|--------------------:|
>>>> | 4K Page    |                0.0% |                4.9% |
>>>> | 64K THP    |              -44.1% |               10.7% |
>>>> | 2M THP     |               56.0% |               65.9% |
>>>>
>>>> So with this change, the regression for 64K swap performance goes away
>>>> and 4K and 2M swap improves slightly too.
>>>>
>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>>> ---
>>>>  include/linux/swap.h |  10 +--
>>>>  mm/swapfile.c        | 149 +++++++++++++++++++++++++++++++------------
>>>>  mm/vmscan.c          |  10 +--
>>>>  3 files changed, 119 insertions(+), 50 deletions(-)
>>>>
>>>> diff --git a/include/linux/swap.h b/include/linux/swap.h
>>>> index 0ca8aaa098ba..ccbca5db851b 100644
>>>> --- a/include/linux/swap.h
>>>> +++ b/include/linux/swap.h
>>>> @@ -295,11 +295,11 @@ struct swap_info_struct {
>>>>  	unsigned int __percpu *cluster_next_cpu; /*percpu index for next allocation */
>>>>  	unsigned int __percpu *cpu_next;/*
>>>>  					 * Likely next allocation offset. We
>>>> -					 * assign a cluster to each CPU, so each
>>>> -					 * CPU can allocate swap entry from its
>>>> -					 * own cluster and swapout sequentially.
>>>> -					 * The purpose is to optimize swapout
>>>> -					 * throughput.
>>>> +					 * assign a cluster per-order to each
>>>> +					 * CPU, so each CPU can allocate swap
>>>> +					 * entry from its own cluster and
>>>> +					 * swapout sequentially. The purpose is
>>>> +					 * to optimize swapout throughput.
>>>>  					 */
>>>
>>> This is kind of hard to understand.  Better to define some intermediate
>>> data structure to improve readability.  For example,
>>>
>>> #ifdef CONFIG_THP_SWAP
>>> #define NR_SWAP_ORDER   PMD_ORDER
>>> #else
>>> #define NR_SWAP_ORDER   1
>>> #endif
>>>
>>> struct percpu_clusters {
>>>         unsigned int alloc_next[NR_SWAP_ORDER];
>>> };
>>>
>>> PMD_ORDER isn't a constant on powerpc, but THP_SWAP isn't supported on
>>> powerpc too.
>>
>> I get your point, but this is just making it more difficult for powerpc to ever
>> enable the feature in future - you're implicitly depending on !powerpc, which
>> seems fragile. How about if I change the first line of the coment to be "per-cpu
>> array indexed by allocation order"? Would that be enough?
> 
> Even if PMD_ORDER isn't constant on powerpc, it's not necessary for
> NR_SWAP_ORDER to be variable.  At least (1 << (NR_SWAP_ORDER-1)) should
> < SWAPFILE_CLUSTER.  When someone adds THP swap support on powerpc, he
> can choose a reasonable constant for NR_SWAP_ORDER (for example, 10 or
> 7).
> 
>>>
>>>>  	struct rb_root swap_extent_root;/* root of the swap extent rbtree */
>>>>  	struct block_device *bdev;	/* swap device or bdev of swap file */
>>>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>>>> index 94f7cc225eb9..b50bce50bed9 100644
>>>> --- a/mm/swapfile.c
>>>> +++ b/mm/swapfile.c
>>>> @@ -545,10 +545,12 @@ static void free_cluster(struct swap_info_struct *si, unsigned long idx)
>>>>  
>>>>  /*
>>>>   * The cluster corresponding to page_nr will be used. The cluster will be
>>>> - * removed from free cluster list and its usage counter will be increased.
>>>> + * removed from free cluster list and its usage counter will be increased by
>>>> + * count.
>>>>   */
>>>> -static void inc_cluster_info_page(struct swap_info_struct *p,
>>>> -	struct swap_cluster_info *cluster_info, unsigned long page_nr)
>>>> +static void add_cluster_info_page(struct swap_info_struct *p,
>>>> +	struct swap_cluster_info *cluster_info, unsigned long page_nr,
>>>> +	unsigned long count)
>>>>  {
>>>>  	unsigned long idx = page_nr / SWAPFILE_CLUSTER;
>>>>  
>>>> @@ -557,9 +559,19 @@ static void inc_cluster_info_page(struct swap_info_struct *p,
>>>>  	if (cluster_is_free(&cluster_info[idx]))
>>>>  		alloc_cluster(p, idx);
>>>>  
>>>> -	VM_BUG_ON(cluster_count(&cluster_info[idx]) >= SWAPFILE_CLUSTER);
>>>> +	VM_BUG_ON(cluster_count(&cluster_info[idx]) + count > SWAPFILE_CLUSTER);
>>>>  	cluster_set_count(&cluster_info[idx],
>>>> -		cluster_count(&cluster_info[idx]) + 1);
>>>> +		cluster_count(&cluster_info[idx]) + count);
>>>> +}
>>>> +
>>>> +/*
>>>> + * The cluster corresponding to page_nr will be used. The cluster will be
>>>> + * removed from free cluster list and its usage counter will be increased.
>>>> + */
>>>> +static void inc_cluster_info_page(struct swap_info_struct *p,
>>>> +	struct swap_cluster_info *cluster_info, unsigned long page_nr)
>>>> +{
>>>> +	add_cluster_info_page(p, cluster_info, page_nr, 1);
>>>>  }
>>>>  
>>>>  /*
>>>> @@ -588,8 +600,8 @@ static void dec_cluster_info_page(struct swap_info_struct *p,
>>>>   * cluster list. Avoiding such abuse to avoid list corruption.
>>>>   */
>>>>  static bool
>>>> -scan_swap_map_ssd_cluster_conflict(struct swap_info_struct *si,
>>>> -	unsigned long offset)
>>>> +__scan_swap_map_ssd_cluster_conflict(struct swap_info_struct *si,
>>>> +	unsigned long offset, int order)
>>>>  {
>>>>  	bool conflict;
>>>>  
>>>> @@ -601,23 +613,36 @@ scan_swap_map_ssd_cluster_conflict(struct swap_info_struct *si,
>>>>  	if (!conflict)
>>>>  		return false;
>>>>  
>>>> -	*this_cpu_ptr(si->cpu_next) = SWAP_NEXT_NULL;
>>>> +	this_cpu_ptr(si->cpu_next)[order] = SWAP_NEXT_NULL;
>>>
>>> This is added in the previous patch.  I don't think SWAP_NEXT_NULL is a
>>> good name.  Because NEXT isn't a pointer (while cluster_next is). Better
>>> to name it as SWAP_NEXT_INVALID, etc.
>>
>> ACK, will make change for next version.
> 
> Thanks!
> 
>>>
>>>>  	return true;
>>>>  }
>>>>  
>>>>  /*
>>>> - * Try to get a swap entry from current cpu's swap entry pool (a cluster). This
>>>> - * might involve allocating a new cluster for current CPU too.
>>>> + * It's possible scan_swap_map_slots() uses a free cluster in the middle of free
>>>> + * cluster list. Avoiding such abuse to avoid list corruption.
>>>>   */
>>>> -static bool scan_swap_map_try_ssd_cluster(struct swap_info_struct *si,
>>>> -	unsigned long *offset, unsigned long *scan_base)
>>>> +static bool
>>>> +scan_swap_map_ssd_cluster_conflict(struct swap_info_struct *si,
>>>> +	unsigned long offset)
>>>> +{
>>>> +	return __scan_swap_map_ssd_cluster_conflict(si, offset, 0);
>>>> +}
>>>> +
>>>> +/*
>>>> + * Try to get a swap entry (or size indicated by order) from current cpu's swap
>>>> + * entry pool (a cluster). This might involve allocating a new cluster for
>>>> + * current CPU too.
>>>> + */
>>>> +static bool __scan_swap_map_try_ssd_cluster(struct swap_info_struct *si,
>>>> +	unsigned long *offset, unsigned long *scan_base, int order)
>>>>  {
>>>>  	struct swap_cluster_info *ci;
>>>> -	unsigned int tmp, max;
>>>> +	unsigned int tmp, max, i;
>>>>  	unsigned int *cpu_next;
>>>> +	unsigned int nr_pages = 1 << order;
>>>>  
>>>>  new_cluster:
>>>> -	cpu_next = this_cpu_ptr(si->cpu_next);
>>>> +	cpu_next = &this_cpu_ptr(si->cpu_next)[order];
>>>>  	tmp = *cpu_next;
>>>>  	if (tmp == SWAP_NEXT_NULL) {
>>>>  		if (!cluster_list_empty(&si->free_clusters)) {
>>>> @@ -643,10 +668,12 @@ static bool scan_swap_map_try_ssd_cluster(struct swap_info_struct *si,
>>>>  	 * reserve a new cluster.
>>>>  	 */
>>>>  	ci = lock_cluster(si, tmp);
>>>> -	if (si->swap_map[tmp]) {
>>>> -		unlock_cluster(ci);
>>>> -		*cpu_next = SWAP_NEXT_NULL;
>>>> -		goto new_cluster;
>>>> +	for (i = 0; i < nr_pages; i++) {
>>>> +		if (si->swap_map[tmp + i]) {
>>>> +			unlock_cluster(ci);
>>>> +			*cpu_next = SWAP_NEXT_NULL;
>>>> +			goto new_cluster;
>>>> +		}
>>>>  	}
>>>>  	unlock_cluster(ci);
>>>>  
>>>> @@ -654,12 +681,22 @@ static bool scan_swap_map_try_ssd_cluster(struct swap_info_struct *si,
>>>>  	*scan_base = tmp;
>>>>  
>>>>  	max = ALIGN_DOWN(tmp, SWAPFILE_CLUSTER) + SWAPFILE_CLUSTER;
>>>
>>> This line is added in a previous patch.  Can we just use
>>>
>>>         max = ALIGN(tmp + 1, SWAPFILE_CLUSTER);
>>
>> Sure. This is how I originally had it, but then decided that the other approach
>> was a bit clearer. But I don't have a strong opinion, so I'll change it as you
>> suggest.
> 
> Thanks!
> 
>>>
>>> Or, add ALIGN_UP() for this?
>>>
>>>> -	tmp += 1;
>>>> +	tmp += nr_pages;
>>>>  	*cpu_next = tmp < max ? tmp : SWAP_NEXT_NULL;
>>>>  
>>>>  	return true;
>>>>  }
>>>>  
>>>> +/*
>>>> + * Try to get a swap entry from current cpu's swap entry pool (a cluster). This
>>>> + * might involve allocating a new cluster for current CPU too.
>>>> + */
>>>> +static bool scan_swap_map_try_ssd_cluster(struct swap_info_struct *si,
>>>> +	unsigned long *offset, unsigned long *scan_base)
>>>> +{
>>>> +	return __scan_swap_map_try_ssd_cluster(si, offset, scan_base, 0);
>>>> +}
>>>> +
>>>>  static void __del_from_avail_list(struct swap_info_struct *p)
>>>>  {
>>>>  	int nid;
>>>> @@ -982,35 +1019,58 @@ static int scan_swap_map_slots(struct swap_info_struct *si,
>>>>  	return n_ret;
>>>>  }
>>>>  
>>>> -static int swap_alloc_cluster(struct swap_info_struct *si, swp_entry_t *slot)
>>>> +static int swap_alloc_large(struct swap_info_struct *si, swp_entry_t *slot,
>>>> +			    unsigned int nr_pages)
>>>
>>> IMHO, it's better to make scan_swap_map_slots() to support order > 0
>>> instead of making swap_alloc_cluster() to support order != PMD_ORDER.
>>> And, we may merge swap_alloc_cluster() with scan_swap_map_slots() after
>>> that.
>>
>> I did consider adding a 5th patch to rename swap_alloc_large() to something like
>> swap_alloc_one_ssd_entry() (which would then be used for order=0 too) and
>> refactor scan_swap_map_slots() to fully delegate to it for the non-scaning ssd
>> allocation case. Would something like that suit?
>>
>> I have reservations about making scan_swap_map_slots() take an order and be the
>> sole entry point:
>>
>>   - in the non-ssd case, we can't support order!=0
> 
> Don't need to check ssd directly, we only support order != 0 if
> si->cluster_info != NULL.
> 
>>   - there is a lot of other logic to deal with falling back to scanning which we
>>     would only want to do for order==0, so we would end up with a few ugly
>>     conditionals against order.
> 
> We don't need to care about them in most cases.  IIUC, only the "goto
> scan" after scan_swap_map_try_ssd_cluster() return false need to "goto
> no_page" for order != 0.
> 
>>   - I was concerned the risk of me introducing a bug when refactoring all that
>>     subtle logic was high
> 
> IMHO, readability is more important for long term maintenance.  So, we
> need to refactor the existing code for that.
> 
>> What do you think? Is not making scan_swap_map_slots() support order > 0 a deal
>> breaker for you?
> 
> I just think that it's better to use scan_swap_map_slots() for any order
> other than PMD_ORDER.  In that way, we share as much code as possible.

OK, I'll take a look at implementing it as you propose, although I likely won't
have bandwidth until start of December. Will repost once I have something.

Thanks,
Ryan

> 
> --
> Best Regards,
> Huang, Ying


  reply	other threads:[~2023-11-03 11:42 UTC|newest]

Thread overview: 116+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-25 14:45 [PATCH v3 0/4] Swap-out small-sized THP without splitting Ryan Roberts
2023-10-25 14:45 ` [PATCH v3 1/4] mm: swap: Remove CLUSTER_FLAG_HUGE from swap_cluster_info:flags Ryan Roberts
2024-02-22 10:19   ` David Hildenbrand
2024-02-22 10:20     ` David Hildenbrand
2024-02-26 17:41       ` Ryan Roberts
2024-02-27 17:10         ` Ryan Roberts
2024-02-27 19:17           ` David Hildenbrand
2024-02-28  9:37             ` Ryan Roberts
2024-02-28 12:12               ` David Hildenbrand
2024-02-28 14:57                 ` Ryan Roberts
2024-02-28 15:12                   ` David Hildenbrand
2024-02-28 15:18                     ` Ryan Roberts
2024-03-01 16:27                     ` Ryan Roberts
2024-03-01 16:31                       ` Matthew Wilcox
2024-03-01 16:44                         ` Ryan Roberts
2024-03-01 17:00                           ` David Hildenbrand
2024-03-01 17:14                             ` Ryan Roberts
2024-03-01 17:18                               ` David Hildenbrand
2024-03-01 17:06                           ` Ryan Roberts
2024-03-04  4:52                             ` Barry Song
2024-03-04  5:42                               ` Barry Song
2024-03-05  7:41                                 ` Ryan Roberts
2024-03-01 16:31                       ` Ryan Roberts
2024-03-01 16:32                       ` David Hildenbrand
2024-03-04 16:03                 ` Ryan Roberts
2024-03-04 17:30                   ` David Hildenbrand
2024-03-04 18:38                     ` Ryan Roberts
2024-03-04 20:50                       ` David Hildenbrand
2024-03-04 21:55                         ` Ryan Roberts
2024-03-04 22:02                           ` David Hildenbrand
2024-03-04 22:34                             ` Ryan Roberts
2024-03-05  6:11                               ` Huang, Ying
2024-03-05  8:35                                 ` David Hildenbrand
2024-03-05  8:46                                   ` Ryan Roberts
2024-02-28 13:33               ` Matthew Wilcox
2024-02-28 14:24                 ` Ryan Roberts
2024-02-28 14:59                   ` Ryan Roberts
2023-10-25 14:45 ` [PATCH v3 2/4] mm: swap: Remove struct percpu_cluster Ryan Roberts
2023-10-25 14:45 ` [PATCH v3 3/4] mm: swap: Simplify ssd behavior when scanner steals entry Ryan Roberts
2023-10-25 14:45 ` [PATCH v3 4/4] mm: swap: Swap-out small-sized THP without splitting Ryan Roberts
2023-10-30  8:18   ` Huang, Ying
2023-10-30 13:59     ` Ryan Roberts
2023-10-31  8:12       ` Huang, Ying
2023-11-03 11:42         ` Ryan Roberts [this message]
2023-11-02  7:40   ` Barry Song
2023-11-02 10:21     ` Ryan Roberts
2023-11-02 22:36       ` Barry Song
2023-11-03 11:31         ` Ryan Roberts
2023-11-03 13:57           ` Steven Price
2023-11-04  9:34             ` Barry Song
2023-11-06 10:12               ` Steven Price
2023-11-06 21:39                 ` Barry Song
2023-11-08 11:51                   ` Steven Price
2023-11-07 12:46               ` Ryan Roberts
2023-11-07 18:05                 ` Barry Song
2023-11-08 11:23                   ` Barry Song
2023-11-08 20:20                     ` Ryan Roberts
2023-11-08 21:04                       ` Barry Song
2023-11-04  5:49           ` Barry Song
2024-02-05  9:51   ` Barry Song
2024-02-05 12:14     ` Ryan Roberts
2024-02-18 23:40       ` Barry Song
2024-02-20 20:03         ` Ryan Roberts
2024-03-05  9:00         ` Ryan Roberts
2024-03-05  9:54           ` Barry Song
2024-03-05 10:44             ` Ryan Roberts
2024-02-27 12:28     ` Ryan Roberts
2024-02-27 13:37     ` Ryan Roberts
2024-02-28  2:46       ` Barry Song
2024-02-22  7:05   ` Barry Song
2024-02-22 10:09     ` David Hildenbrand
2024-02-23  9:46       ` Barry Song
2024-02-27 12:05         ` Ryan Roberts
2024-02-28  1:23           ` Barry Song
2024-02-28  9:34             ` David Hildenbrand
2024-02-28 23:18               ` Barry Song
2024-02-28 15:57             ` Ryan Roberts
2023-11-29  7:47 ` [PATCH v3 0/4] " Barry Song
2023-11-29 12:06   ` Ryan Roberts
2023-11-29 20:38     ` Barry Song
2024-01-18 11:10 ` [PATCH RFC 0/6] mm: support large folios swap-in Barry Song
2024-01-18 11:10   ` [PATCH RFC 1/6] arm64: mm: swap: support THP_SWAP on hardware with MTE Barry Song
2024-01-26 23:14     ` Chris Li
2024-02-26  2:59       ` Barry Song
2024-01-18 11:10   ` [PATCH RFC 2/6] mm: swap: introduce swap_nr_free() for batched swap_free() Barry Song
2024-01-26 23:17     ` Chris Li
2024-02-26  4:47       ` Barry Song
2024-01-18 11:10   ` [PATCH RFC 3/6] mm: swap: make should_try_to_free_swap() support large-folio Barry Song
2024-01-26 23:22     ` Chris Li
2024-01-18 11:10   ` [PATCH RFC 4/6] mm: support large folios swapin as a whole Barry Song
2024-01-27 19:53     ` Chris Li
2024-02-26  7:29       ` Barry Song
2024-01-27 20:06     ` Chris Li
2024-02-26  7:31       ` Barry Song
2024-01-18 11:10   ` [PATCH RFC 5/6] mm: rmap: weaken the WARN_ON in __folio_add_anon_rmap() Barry Song
2024-01-18 11:54     ` David Hildenbrand
2024-01-23  6:49       ` Barry Song
2024-01-29  3:25         ` Chris Li
2024-01-29 10:06           ` David Hildenbrand
2024-01-29 16:31             ` Chris Li
2024-02-26  5:05               ` Barry Song
2024-04-06 23:27             ` Barry Song
2024-01-27 23:41     ` Chris Li
2024-01-18 11:10   ` [PATCH RFC 6/6] mm: madvise: don't split mTHP for MADV_PAGEOUT Barry Song
2024-01-29  2:15     ` Chris Li
2024-02-26  6:39       ` Barry Song
2024-02-27 12:22     ` Ryan Roberts
2024-02-27 22:39       ` Barry Song
2024-02-27 14:40     ` Ryan Roberts
2024-02-27 18:57       ` Barry Song
2024-02-28  3:49         ` Barry Song
2024-01-18 15:25   ` [PATCH RFC 0/6] mm: support large folios swap-in Ryan Roberts
2024-01-18 23:54     ` Barry Song
2024-01-19 13:25       ` Ryan Roberts
2024-01-27 14:27         ` Barry Song
2024-01-29  9:05   ` Huang, Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1bec0057-8d8c-433f-aee6-e6b2f781aa54@arm.com \
    --to=ryan.roberts@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=shy828301@gmail.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=willy@infradead.org \
    --cc=xiang@kernel.org \
    --cc=ying.huang@intel.com \
    --cc=yuzhao@googleN.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.