All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm, sparse: do not swamp log with huge vmemmap allocation failures
@ 2017-11-06  9:22 ` Michal Hocko
  0 siblings, 0 replies; 51+ messages in thread
From: Michal Hocko @ 2017-11-06  9:22 UTC (permalink / raw)
  To: Andrew Morton, Johannes Weiner
  Cc: Vlastimil Babka, linux-mm, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

While doing a memory hotplug tests under a heavy memory pressure we have
noticed too many page allocation failures when allocating vmemmap memmap
backed by huge page
[146792.281354] kworker/u3072:1: page allocation failure: order:9, mode:0x24084c0(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO)
[...]
[146792.281394] Call Trace:
[146792.281430]  [<ffffffff81019a99>] dump_trace+0x59/0x310
[146792.281436]  [<ffffffff81019e3a>] show_stack_log_lvl+0xea/0x170
[146792.281440]  [<ffffffff8101abc1>] show_stack+0x21/0x40
[146792.281448]  [<ffffffff8130f040>] dump_stack+0x5c/0x7c
[146792.281464]  [<ffffffff8118c982>] warn_alloc_failed+0xe2/0x150
[146792.281471]  [<ffffffff8118cddd>] __alloc_pages_nodemask+0x3ed/0xb20
[146792.281489]  [<ffffffff811d3aaf>] alloc_pages_current+0x7f/0x100
[146792.281503]  [<ffffffff815dfa2c>] vmemmap_alloc_block+0x79/0xb6
[146792.281510]  [<ffffffff815dfbd3>] __vmemmap_alloc_block_buf+0x136/0x145
[146792.281524]  [<ffffffff815dd0c5>] vmemmap_populate+0xd2/0x2b9
[146792.281529]  [<ffffffff815dffd9>] sparse_mem_map_populate+0x23/0x30
[146792.281532]  [<ffffffff815df88d>] sparse_add_one_section+0x68/0x18e
[146792.281537]  [<ffffffff815d9f5a>] __add_pages+0x10a/0x1d0
[146792.281553]  [<ffffffff8106249a>] arch_add_memory+0x4a/0xc0
[146792.281559]  [<ffffffff815da1f9>] add_memory_resource+0x89/0x160
[146792.281564]  [<ffffffff815da33d>] add_memory+0x6d/0xd0
[146792.281585]  [<ffffffff813d36c4>] acpi_memory_device_add+0x181/0x251
[146792.281597]  [<ffffffff813946e5>] acpi_bus_attach+0xfd/0x19b
[146792.281602]  [<ffffffff81394866>] acpi_bus_scan+0x59/0x69
[146792.281604]  [<ffffffff813949de>] acpi_device_hotplug+0xd2/0x41f
[146792.281608]  [<ffffffff8138db67>] acpi_hotplug_work_fn+0x1a/0x23
[146792.281623]  [<ffffffff81093cee>] process_one_work+0x14e/0x410
[146792.281630]  [<ffffffff81094546>] worker_thread+0x116/0x490
[146792.281637]  [<ffffffff810999ed>] kthread+0xbd/0xe0
[146792.281651]  [<ffffffff815e4e7f>] ret_from_fork+0x3f/0x70

and we do see many of those because essentially every the allocation
failes for each memory section. This is overly excessive way to tell
user that there is nothing to really worry about because we do have
a fallback mechanism to use base pages. The only downside might be a
performance degradation due to TLB pressure.

This patch changes vmemmap_alloc_block to use __GFP_NOWARN and warn
explicitly once on the first allocation failure. This will reduce the
noise in the kernel log considerably, while we still have an indication
that a performance might be impacted.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
Hi,
this has somehow fell of my radar completely. The patch is essentially
what Johannes suggested [1] so I have added his s-o-b and added the
changelog into it.

Can we have this merged?

[1] http://lkml.kernel.org/r/20170711214541.GA11141@cmpxchg.org

 arch/x86/mm/init_64.c |  1 -
 mm/sparse-vmemmap.c   | 11 +++++++++--
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 126e09625979..5eb954f930be 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1405,7 +1405,6 @@ static int __meminit vmemmap_populate_hugepages(unsigned long start,
 			vmemmap_verify((pte_t *)pmd, node, addr, next);
 			continue;
 		}
-		pr_warn_once("vmemmap: falling back to regular page backing\n");
 		if (vmemmap_populate_basepages(addr, next, node))
 			return -ENOMEM;
 	}
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index d0860aab1c89..3f85084cb8bb 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -52,12 +52,19 @@ void * __meminit vmemmap_alloc_block(unsigned long size, int node)
 {
 	/* If the main allocator is up use that, fallback to bootmem. */
 	if (slab_is_available()) {
+		gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN;
+		int order = get_order(size);
+		static bool warned;
 		struct page *page;
 
-		page = alloc_pages_node(node, GFP_KERNEL | __GFP_RETRY_MAYFAIL,
-					get_order(size));
+		page = alloc_pages_node(node, gfp_mask, order);
 		if (page)
 			return page_address(page);
+
+		if (!warned) {
+			warn_alloc(gfp_mask, NULL, "vmemmap alloc failure: order:%u", order);
+			warned = true;
+		}
 		return NULL;
 	} else
 		return __earlyonly_bootmem_alloc(node, size, size,
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH] mm, sparse: do not swamp log with huge vmemmap allocation failures
@ 2017-11-06  9:22 ` Michal Hocko
  0 siblings, 0 replies; 51+ messages in thread
From: Michal Hocko @ 2017-11-06  9:22 UTC (permalink / raw)
  To: Andrew Morton, Johannes Weiner
  Cc: Vlastimil Babka, linux-mm, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

While doing a memory hotplug tests under a heavy memory pressure we have
noticed too many page allocation failures when allocating vmemmap memmap
backed by huge page
[146792.281354] kworker/u3072:1: page allocation failure: order:9, mode:0x24084c0(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO)
[...]
[146792.281394] Call Trace:
[146792.281430]  [<ffffffff81019a99>] dump_trace+0x59/0x310
[146792.281436]  [<ffffffff81019e3a>] show_stack_log_lvl+0xea/0x170
[146792.281440]  [<ffffffff8101abc1>] show_stack+0x21/0x40
[146792.281448]  [<ffffffff8130f040>] dump_stack+0x5c/0x7c
[146792.281464]  [<ffffffff8118c982>] warn_alloc_failed+0xe2/0x150
[146792.281471]  [<ffffffff8118cddd>] __alloc_pages_nodemask+0x3ed/0xb20
[146792.281489]  [<ffffffff811d3aaf>] alloc_pages_current+0x7f/0x100
[146792.281503]  [<ffffffff815dfa2c>] vmemmap_alloc_block+0x79/0xb6
[146792.281510]  [<ffffffff815dfbd3>] __vmemmap_alloc_block_buf+0x136/0x145
[146792.281524]  [<ffffffff815dd0c5>] vmemmap_populate+0xd2/0x2b9
[146792.281529]  [<ffffffff815dffd9>] sparse_mem_map_populate+0x23/0x30
[146792.281532]  [<ffffffff815df88d>] sparse_add_one_section+0x68/0x18e
[146792.281537]  [<ffffffff815d9f5a>] __add_pages+0x10a/0x1d0
[146792.281553]  [<ffffffff8106249a>] arch_add_memory+0x4a/0xc0
[146792.281559]  [<ffffffff815da1f9>] add_memory_resource+0x89/0x160
[146792.281564]  [<ffffffff815da33d>] add_memory+0x6d/0xd0
[146792.281585]  [<ffffffff813d36c4>] acpi_memory_device_add+0x181/0x251
[146792.281597]  [<ffffffff813946e5>] acpi_bus_attach+0xfd/0x19b
[146792.281602]  [<ffffffff81394866>] acpi_bus_scan+0x59/0x69
[146792.281604]  [<ffffffff813949de>] acpi_device_hotplug+0xd2/0x41f
[146792.281608]  [<ffffffff8138db67>] acpi_hotplug_work_fn+0x1a/0x23
[146792.281623]  [<ffffffff81093cee>] process_one_work+0x14e/0x410
[146792.281630]  [<ffffffff81094546>] worker_thread+0x116/0x490
[146792.281637]  [<ffffffff810999ed>] kthread+0xbd/0xe0
[146792.281651]  [<ffffffff815e4e7f>] ret_from_fork+0x3f/0x70

and we do see many of those because essentially every the allocation
failes for each memory section. This is overly excessive way to tell
user that there is nothing to really worry about because we do have
a fallback mechanism to use base pages. The only downside might be a
performance degradation due to TLB pressure.

This patch changes vmemmap_alloc_block to use __GFP_NOWARN and warn
explicitly once on the first allocation failure. This will reduce the
noise in the kernel log considerably, while we still have an indication
that a performance might be impacted.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
Hi,
this has somehow fell of my radar completely. The patch is essentially
what Johannes suggested [1] so I have added his s-o-b and added the
changelog into it.

Can we have this merged?

[1] http://lkml.kernel.org/r/20170711214541.GA11141@cmpxchg.org

 arch/x86/mm/init_64.c |  1 -
 mm/sparse-vmemmap.c   | 11 +++++++++--
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 126e09625979..5eb954f930be 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1405,7 +1405,6 @@ static int __meminit vmemmap_populate_hugepages(unsigned long start,
 			vmemmap_verify((pte_t *)pmd, node, addr, next);
 			continue;
 		}
-		pr_warn_once("vmemmap: falling back to regular page backing\n");
 		if (vmemmap_populate_basepages(addr, next, node))
 			return -ENOMEM;
 	}
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index d0860aab1c89..3f85084cb8bb 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -52,12 +52,19 @@ void * __meminit vmemmap_alloc_block(unsigned long size, int node)
 {
 	/* If the main allocator is up use that, fallback to bootmem. */
 	if (slab_is_available()) {
+		gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN;
+		int order = get_order(size);
+		static bool warned;
 		struct page *page;
 
-		page = alloc_pages_node(node, GFP_KERNEL | __GFP_RETRY_MAYFAIL,
-					get_order(size));
+		page = alloc_pages_node(node, gfp_mask, order);
 		if (page)
 			return page_address(page);
+
+		if (!warned) {
+			warn_alloc(gfp_mask, NULL, "vmemmap alloc failure: order:%u", order);
+			warned = true;
+		}
 		return NULL;
 	} else
 		return __earlyonly_bootmem_alloc(node, size, size,
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH] mm, sparse: do not swamp log with huge vmemmap allocation failures
  2017-11-06  9:22 ` Michal Hocko
@ 2017-11-06 17:35   ` Johannes Weiner
  -1 siblings, 0 replies; 51+ messages in thread
From: Johannes Weiner @ 2017-11-06 17:35 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, Vlastimil Babka, linux-mm, LKML, Michal Hocko

On Mon, Nov 06, 2017 at 10:22:28AM +0100, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> While doing a memory hotplug tests under a heavy memory pressure we have
> noticed too many page allocation failures when allocating vmemmap memmap
> backed by huge page
> [146792.281354] kworker/u3072:1: page allocation failure: order:9, mode:0x24084c0(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO)
> [...]
> [146792.281394] Call Trace:
> [146792.281430]  [<ffffffff81019a99>] dump_trace+0x59/0x310
> [146792.281436]  [<ffffffff81019e3a>] show_stack_log_lvl+0xea/0x170
> [146792.281440]  [<ffffffff8101abc1>] show_stack+0x21/0x40
> [146792.281448]  [<ffffffff8130f040>] dump_stack+0x5c/0x7c
> [146792.281464]  [<ffffffff8118c982>] warn_alloc_failed+0xe2/0x150
> [146792.281471]  [<ffffffff8118cddd>] __alloc_pages_nodemask+0x3ed/0xb20
> [146792.281489]  [<ffffffff811d3aaf>] alloc_pages_current+0x7f/0x100
> [146792.281503]  [<ffffffff815dfa2c>] vmemmap_alloc_block+0x79/0xb6
> [146792.281510]  [<ffffffff815dfbd3>] __vmemmap_alloc_block_buf+0x136/0x145
> [146792.281524]  [<ffffffff815dd0c5>] vmemmap_populate+0xd2/0x2b9
> [146792.281529]  [<ffffffff815dffd9>] sparse_mem_map_populate+0x23/0x30
> [146792.281532]  [<ffffffff815df88d>] sparse_add_one_section+0x68/0x18e
> [146792.281537]  [<ffffffff815d9f5a>] __add_pages+0x10a/0x1d0
> [146792.281553]  [<ffffffff8106249a>] arch_add_memory+0x4a/0xc0
> [146792.281559]  [<ffffffff815da1f9>] add_memory_resource+0x89/0x160
> [146792.281564]  [<ffffffff815da33d>] add_memory+0x6d/0xd0
> [146792.281585]  [<ffffffff813d36c4>] acpi_memory_device_add+0x181/0x251
> [146792.281597]  [<ffffffff813946e5>] acpi_bus_attach+0xfd/0x19b
> [146792.281602]  [<ffffffff81394866>] acpi_bus_scan+0x59/0x69
> [146792.281604]  [<ffffffff813949de>] acpi_device_hotplug+0xd2/0x41f
> [146792.281608]  [<ffffffff8138db67>] acpi_hotplug_work_fn+0x1a/0x23
> [146792.281623]  [<ffffffff81093cee>] process_one_work+0x14e/0x410
> [146792.281630]  [<ffffffff81094546>] worker_thread+0x116/0x490
> [146792.281637]  [<ffffffff810999ed>] kthread+0xbd/0xe0
> [146792.281651]  [<ffffffff815e4e7f>] ret_from_fork+0x3f/0x70
> 
> and we do see many of those because essentially every the allocation
> failes for each memory section. This is overly excessive way to tell
> user that there is nothing to really worry about because we do have
> a fallback mechanism to use base pages. The only downside might be a
> performance degradation due to TLB pressure.
> 
> This patch changes vmemmap_alloc_block to use __GFP_NOWARN and warn
> explicitly once on the first allocation failure. This will reduce the
> noise in the kernel log considerably, while we still have an indication
> that a performance might be impacted.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
> Hi,
> this has somehow fell of my radar completely. The patch is essentially
> what Johannes suggested [1] so I have added his s-o-b and added the
> changelog into it.

Looks good to me.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] mm, sparse: do not swamp log with huge vmemmap allocation failures
@ 2017-11-06 17:35   ` Johannes Weiner
  0 siblings, 0 replies; 51+ messages in thread
From: Johannes Weiner @ 2017-11-06 17:35 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, Vlastimil Babka, linux-mm, LKML, Michal Hocko

On Mon, Nov 06, 2017 at 10:22:28AM +0100, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> While doing a memory hotplug tests under a heavy memory pressure we have
> noticed too many page allocation failures when allocating vmemmap memmap
> backed by huge page
> [146792.281354] kworker/u3072:1: page allocation failure: order:9, mode:0x24084c0(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO)
> [...]
> [146792.281394] Call Trace:
> [146792.281430]  [<ffffffff81019a99>] dump_trace+0x59/0x310
> [146792.281436]  [<ffffffff81019e3a>] show_stack_log_lvl+0xea/0x170
> [146792.281440]  [<ffffffff8101abc1>] show_stack+0x21/0x40
> [146792.281448]  [<ffffffff8130f040>] dump_stack+0x5c/0x7c
> [146792.281464]  [<ffffffff8118c982>] warn_alloc_failed+0xe2/0x150
> [146792.281471]  [<ffffffff8118cddd>] __alloc_pages_nodemask+0x3ed/0xb20
> [146792.281489]  [<ffffffff811d3aaf>] alloc_pages_current+0x7f/0x100
> [146792.281503]  [<ffffffff815dfa2c>] vmemmap_alloc_block+0x79/0xb6
> [146792.281510]  [<ffffffff815dfbd3>] __vmemmap_alloc_block_buf+0x136/0x145
> [146792.281524]  [<ffffffff815dd0c5>] vmemmap_populate+0xd2/0x2b9
> [146792.281529]  [<ffffffff815dffd9>] sparse_mem_map_populate+0x23/0x30
> [146792.281532]  [<ffffffff815df88d>] sparse_add_one_section+0x68/0x18e
> [146792.281537]  [<ffffffff815d9f5a>] __add_pages+0x10a/0x1d0
> [146792.281553]  [<ffffffff8106249a>] arch_add_memory+0x4a/0xc0
> [146792.281559]  [<ffffffff815da1f9>] add_memory_resource+0x89/0x160
> [146792.281564]  [<ffffffff815da33d>] add_memory+0x6d/0xd0
> [146792.281585]  [<ffffffff813d36c4>] acpi_memory_device_add+0x181/0x251
> [146792.281597]  [<ffffffff813946e5>] acpi_bus_attach+0xfd/0x19b
> [146792.281602]  [<ffffffff81394866>] acpi_bus_scan+0x59/0x69
> [146792.281604]  [<ffffffff813949de>] acpi_device_hotplug+0xd2/0x41f
> [146792.281608]  [<ffffffff8138db67>] acpi_hotplug_work_fn+0x1a/0x23
> [146792.281623]  [<ffffffff81093cee>] process_one_work+0x14e/0x410
> [146792.281630]  [<ffffffff81094546>] worker_thread+0x116/0x490
> [146792.281637]  [<ffffffff810999ed>] kthread+0xbd/0xe0
> [146792.281651]  [<ffffffff815e4e7f>] ret_from_fork+0x3f/0x70
> 
> and we do see many of those because essentially every the allocation
> failes for each memory section. This is overly excessive way to tell
> user that there is nothing to really worry about because we do have
> a fallback mechanism to use base pages. The only downside might be a
> performance degradation due to TLB pressure.
> 
> This patch changes vmemmap_alloc_block to use __GFP_NOWARN and warn
> explicitly once on the first allocation failure. This will reduce the
> noise in the kernel log considerably, while we still have an indication
> that a performance might be impacted.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
> Hi,
> this has somehow fell of my radar completely. The patch is essentially
> what Johannes suggested [1] so I have added his s-o-b and added the
> changelog into it.

Looks good to me.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] mm, sparse: do not swamp log with huge vmemmap allocation failures
  2017-11-06 17:35   ` Johannes Weiner
  (?)
@ 2017-11-06 17:57   ` Joe Perches
  -1 siblings, 0 replies; 51+ messages in thread
From: Joe Perches @ 2017-11-06 17:57 UTC (permalink / raw)
  To: Johannes Weiner, Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, linux-mm, LKML, Michal Hocko

[-- Attachment #1: Type: text/plain, Size: 3867 bytes --]

On Mon, 2017-11-06 at 12:35 -0500, Johannes Weiner wrote:
> On Mon, Nov 06, 2017 at 10:22:28AM +0100, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > While doing a memory hotplug tests under a heavy memory pressure we have
> > noticed too many page allocation failures when allocating vmemmap memmap
> > backed by huge page
> > [146792.281354] kworker/u3072:1: page allocation failure: order:9, mode:0x24084c0(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO)
> > [...]
> > [146792.281394] Call Trace:
> > [146792.281430]  [<ffffffff81019a99>] dump_trace+0x59/0x310
> > [146792.281436]  [<ffffffff81019e3a>] show_stack_log_lvl+0xea/0x170
> > [146792.281440]  [<ffffffff8101abc1>] show_stack+0x21/0x40
> > [146792.281448]  [<ffffffff8130f040>] dump_stack+0x5c/0x7c
> > [146792.281464]  [<ffffffff8118c982>] warn_alloc_failed+0xe2/0x150
> > [146792.281471]  [<ffffffff8118cddd>] __alloc_pages_nodemask+0x3ed/0xb20
> > [146792.281489]  [<ffffffff811d3aaf>] alloc_pages_current+0x7f/0x100
> > [146792.281503]  [<ffffffff815dfa2c>] vmemmap_alloc_block+0x79/0xb6
> > [146792.281510]  [<ffffffff815dfbd3>] __vmemmap_alloc_block_buf+0x136/0x145
> > [146792.281524]  [<ffffffff815dd0c5>] vmemmap_populate+0xd2/0x2b9
> > [146792.281529]  [<ffffffff815dffd9>] sparse_mem_map_populate+0x23/0x30
> > [146792.281532]  [<ffffffff815df88d>] sparse_add_one_section+0x68/0x18e
> > [146792.281537]  [<ffffffff815d9f5a>] __add_pages+0x10a/0x1d0
> > [146792.281553]  [<ffffffff8106249a>] arch_add_memory+0x4a/0xc0
> > [146792.281559]  [<ffffffff815da1f9>] add_memory_resource+0x89/0x160
> > [146792.281564]  [<ffffffff815da33d>] add_memory+0x6d/0xd0
> > [146792.281585]  [<ffffffff813d36c4>] acpi_memory_device_add+0x181/0x251
> > [146792.281597]  [<ffffffff813946e5>] acpi_bus_attach+0xfd/0x19b
> > [146792.281602]  [<ffffffff81394866>] acpi_bus_scan+0x59/0x69
> > [146792.281604]  [<ffffffff813949de>] acpi_device_hotplug+0xd2/0x41f
> > [146792.281608]  [<ffffffff8138db67>] acpi_hotplug_work_fn+0x1a/0x23
> > [146792.281623]  [<ffffffff81093cee>] process_one_work+0x14e/0x410
> > [146792.281630]  [<ffffffff81094546>] worker_thread+0x116/0x490
> > [146792.281637]  [<ffffffff810999ed>] kthread+0xbd/0xe0
> > [146792.281651]  [<ffffffff815e4e7f>] ret_from_fork+0x3f/0x70
> > 
> > and we do see many of those because essentially every the allocation
> > failes for each memory section. This is overly excessive way to tell
> > user that there is nothing to really worry about because we do have
> > a fallback mechanism to use base pages. The only downside might be a
> > performance degradation due to TLB pressure.
> > 
> > This patch changes vmemmap_alloc_block to use __GFP_NOWARN and warn
> > explicitly once on the first allocation failure. This will reduce the
> > noise in the kernel log considerably, while we still have an indication
> > that a performance might be impacted.
> > 
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> > Hi,
> > this has somehow fell of my radar completely. The patch is essentially
> > what Johannes suggested [1] so I have added his s-o-b and added the
> > changelog into it.
> 
> Looks good to me.

I think it'd be better to change the ratelimit state
to something like once a minute
---
 mm/page_alloc.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 82e6d2c914ab..af3f92beec04 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3269,8 +3269,7 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...)
 {
 	struct va_format vaf;
 	va_list args;
-	static DEFINE_RATELIMIT_STATE(nopage_rs, DEFAULT_RATELIMIT_INTERVAL,
-				      DEFAULT_RATELIMIT_BURST);
+	static DEFINE_RATELIMIT_STATE(nopage_rs, HZ * 60, 1);
 
 	if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs))
 		return;

[-- Attachment #2: 1.difd --]
[-- Type: text/plain, Size: 584 bytes --]

 mm/page_alloc.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 82e6d2c914ab..af3f92beec04 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3269,8 +3269,7 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...)
 {
 	struct va_format vaf;
 	va_list args;
-	static DEFINE_RATELIMIT_STATE(nopage_rs, DEFAULT_RATELIMIT_INTERVAL,
-				      DEFAULT_RATELIMIT_BURST);
+	static DEFINE_RATELIMIT_STATE(nopage_rs, HZ * 60, 1);
 
 	if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs))
 		return;

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH] mm, sparse: do not swamp log with huge vmemmap allocation failures
  2017-11-06  9:22 ` Michal Hocko
@ 2017-11-06 18:14   ` Khalid Aziz
  -1 siblings, 0 replies; 51+ messages in thread
From: Khalid Aziz @ 2017-11-06 18:14 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton, Johannes Weiner
  Cc: Vlastimil Babka, linux-mm, LKML, Michal Hocko

On Mon, 2017-11-06 at 10:22 +0100, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> While doing a memory hotplug tests under a heavy memory pressure we
> have
> noticed too many page allocation failures when allocating vmemmap
> memmap
> backed by huge page
> ......... deleted .........
> +
> +		if (!warned) {
> +			warn_alloc(gfp_mask, NULL, "vmemmap alloc
> failure: order:%u", order);
> +			warned = true;
> +		}
>  		return NULL;
>  	} else
>  		return __earlyonly_bootmem_alloc(node, size, size,

This will warn once and only once after a kernel is booted. This
condition may happen repeatedly over a long period of time with
significant time span between two such events and it can be useful to
know if this is happening repeatedly. There might be better ways to
throttle the rate of warnings, something like warn once and then
suppress warnings for the next 15 minutes (or pick any other time
frame). If this condition happens again later, there will be another
warning.

--
Khalid

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] mm, sparse: do not swamp log with huge vmemmap allocation failures
@ 2017-11-06 18:14   ` Khalid Aziz
  0 siblings, 0 replies; 51+ messages in thread
From: Khalid Aziz @ 2017-11-06 18:14 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton, Johannes Weiner
  Cc: Vlastimil Babka, linux-mm, LKML, Michal Hocko

On Mon, 2017-11-06 at 10:22 +0100, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> While doing a memory hotplug tests under a heavy memory pressure we
> have
> noticed too many page allocation failures when allocating vmemmap
> memmap
> backed by huge page
> ......... deleted .........
> +
> +		if (!warned) {
> +			warn_alloc(gfp_mask, NULL, "vmemmap alloc
> failure: order:%u", order);
> +			warned = true;
> +		}
>  		return NULL;
>  	} else
>  		return __earlyonly_bootmem_alloc(node, size, size,

This will warn once and only once after a kernel is booted. This
condition may happen repeatedly over a long period of time with
significant time span between two such events and it can be useful to
know if this is happening repeatedly. There might be better ways to
throttle the rate of warnings, something like warn once and then
suppress warnings for the next 15 minutes (or pick any other time
frame). If this condition happens again later, there will be another
warning.

--
Khalid

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] mm, sparse: do not swamp log with huge vmemmap allocation failures
  2017-11-06 18:14   ` Khalid Aziz
@ 2017-11-06 18:18     ` Michal Hocko
  -1 siblings, 0 replies; 51+ messages in thread
From: Michal Hocko @ 2017-11-06 18:18 UTC (permalink / raw)
  To: Khalid Aziz
  Cc: Andrew Morton, Johannes Weiner, Vlastimil Babka, linux-mm, LKML

On Mon 06-11-17 11:14:27, Khalid Aziz wrote:
> On Mon, 2017-11-06 at 10:22 +0100, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > While doing a memory hotplug tests under a heavy memory pressure we
> > have
> > noticed too many page allocation failures when allocating vmemmap
> > memmap
> > backed by huge page
> > ......... deleted .........
> > +
> > +		if (!warned) {
> > +			warn_alloc(gfp_mask, NULL, "vmemmap alloc
> > failure: order:%u", order);
> > +			warned = true;
> > +		}
> >  		return NULL;
> >  	} else
> >  		return __earlyonly_bootmem_alloc(node, size, size,
> 
> This will warn once and only once after a kernel is booted. This
> condition may happen repeatedly over a long period of time with
> significant time span between two such events and it can be useful to
> know if this is happening repeatedly. There might be better ways to
> throttle the rate of warnings, something like warn once and then
> suppress warnings for the next 15 minutes (or pick any other time
> frame). If this condition happens again later, there will be another
> warning.

While this is all true I am not sure we care all that much. The failure
mode is basically not using an optimization. This is not something we
warn normally about. Even the performance degradation is a theoretical
concern which nobody has backed by real life numbers AFAIR.

If we want to make it more sophisticated I would expect some numbers to
back such a change.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] mm, sparse: do not swamp log with huge vmemmap allocation failures
@ 2017-11-06 18:18     ` Michal Hocko
  0 siblings, 0 replies; 51+ messages in thread
From: Michal Hocko @ 2017-11-06 18:18 UTC (permalink / raw)
  To: Khalid Aziz
  Cc: Andrew Morton, Johannes Weiner, Vlastimil Babka, linux-mm, LKML

On Mon 06-11-17 11:14:27, Khalid Aziz wrote:
> On Mon, 2017-11-06 at 10:22 +0100, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > While doing a memory hotplug tests under a heavy memory pressure we
> > have
> > noticed too many page allocation failures when allocating vmemmap
> > memmap
> > backed by huge page
> > ......... deleted .........
> > +
> > +		if (!warned) {
> > +			warn_alloc(gfp_mask, NULL, "vmemmap alloc
> > failure: order:%u", order);
> > +			warned = true;
> > +		}
> >  		return NULL;
> >  	} else
> >  		return __earlyonly_bootmem_alloc(node, size, size,
> 
> This will warn once and only once after a kernel is booted. This
> condition may happen repeatedly over a long period of time with
> significant time span between two such events and it can be useful to
> know if this is happening repeatedly. There might be better ways to
> throttle the rate of warnings, something like warn once and then
> suppress warnings for the next 15 minutes (or pick any other time
> frame). If this condition happens again later, there will be another
> warning.

While this is all true I am not sure we care all that much. The failure
mode is basically not using an optimization. This is not something we
warn normally about. Even the performance degradation is a theoretical
concern which nobody has backed by real life numbers AFAIR.

If we want to make it more sophisticated I would expect some numbers to
back such a change.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] mm, sparse: do not swamp log with huge vmemmap allocation failures
  2017-11-06 18:18     ` Michal Hocko
@ 2017-11-06 20:17       ` Khalid Aziz
  -1 siblings, 0 replies; 51+ messages in thread
From: Khalid Aziz @ 2017-11-06 20:17 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Johannes Weiner, Vlastimil Babka, linux-mm, LKML

On 11/06/2017 11:18 AM, Michal Hocko wrote:
> If we want to make it more sophisticated I would expect some numbers to
> back such a change.
> 

That is reasonable enough.

--
Khalid

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] mm, sparse: do not swamp log with huge vmemmap allocation failures
@ 2017-11-06 20:17       ` Khalid Aziz
  0 siblings, 0 replies; 51+ messages in thread
From: Khalid Aziz @ 2017-11-06 20:17 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Johannes Weiner, Vlastimil Babka, linux-mm, LKML

On 11/06/2017 11:18 AM, Michal Hocko wrote:
> If we want to make it more sophisticated I would expect some numbers to
> back such a change.
> 

That is reasonable enough.

--
Khalid

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [RFC PATCH for 4.15 00/14] Restartable sequences and CPU op vector v10
@ 2017-11-06 20:56 ` Mathieu Desnoyers
  0 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2017-11-06 20:56 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers

Here is an updated rseq patchset taking into account the feedback
received at kernel summit and afterwards.

Use-cases explanation and benchmarks can be found in patch 01
"Restartable sequences system call".

This is still submitted as RFC. I'm keeping a linux-rseq
tree with this patchset at:

https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/tag/?h=v4.14-rc8-rseq-20171106.1

If everyone is happy with it, I plan to submit this officially
when the merge window opens.

Thanks,

Mathieu

Boqun Feng (2):
  Restartable sequences: powerpc architecture support
  Restartable sequences: Wire up powerpc system call

Mathieu Desnoyers (12):
  Restartable sequences system call (v10)
  Restartable sequences: ARM 32 architecture support
  Restartable sequences: wire up ARM 32 system call
  Restartable sequences: x86 32/64 architecture support
  Restartable sequences: wire up x86 32/64 system call
  Provide cpu_opv system call (v2)
  cpu_opv: Wire up x86 32/64 system call
  cpu_opv: Wire up powerpc system call
  cpu_opv: Wire up ARM32 system call
  cpu_opv: Implement selftests (v2)
  Restartable sequences: Provide self-tests (v2)
  Restartable sequences selftests: arm: workaround gcc asm size guess

 MAINTAINERS                                        |   20 +
 arch/Kconfig                                       |    7 +
 arch/arm/Kconfig                                   |    1 +
 arch/arm/kernel/signal.c                           |    7 +
 arch/arm/tools/syscall.tbl                         |    2 +
 arch/powerpc/Kconfig                               |    1 +
 arch/powerpc/include/asm/systbl.h                  |    2 +
 arch/powerpc/include/asm/unistd.h                  |    2 +-
 arch/powerpc/include/uapi/asm/unistd.h             |    2 +
 arch/powerpc/kernel/signal.c                       |    3 +
 arch/x86/Kconfig                                   |    1 +
 arch/x86/entry/common.c                            |    1 +
 arch/x86/entry/syscalls/syscall_32.tbl             |    2 +
 arch/x86/entry/syscalls/syscall_64.tbl             |    2 +
 arch/x86/kernel/signal.c                           |    6 +
 fs/exec.c                                          |    1 +
 include/linux/sched.h                              |   89 ++
 include/trace/events/rseq.h                        |   60 +
 include/uapi/linux/cpu_opv.h                       |  117 ++
 include/uapi/linux/rseq.h                          |  138 +++
 init/Kconfig                                       |   28 +
 kernel/Makefile                                    |    2 +
 kernel/cpu_opv.c                                   |  952 +++++++++++++++
 kernel/fork.c                                      |    2 +
 kernel/rseq.c                                      |  329 +++++
 kernel/sched/core.c                                |   41 +
 kernel/sched/sched.h                               |    2 +
 kernel/sys_ni.c                                    |    4 +
 tools/testing/selftests/Makefile                   |    2 +
 tools/testing/selftests/cpu-opv/.gitignore         |    1 +
 tools/testing/selftests/cpu-opv/Makefile           |   15 +
 .../testing/selftests/cpu-opv/basic_cpu_opv_test.c | 1157 ++++++++++++++++++
 tools/testing/selftests/cpu-opv/cpu-op.c           |  348 ++++++
 tools/testing/selftests/cpu-opv/cpu-op.h           |   68 ++
 tools/testing/selftests/rseq/.gitignore            |    4 +
 tools/testing/selftests/rseq/Makefile              |   22 +
 .../testing/selftests/rseq/basic_percpu_ops_test.c |  333 +++++
 tools/testing/selftests/rseq/basic_test.c          |   55 +
 tools/testing/selftests/rseq/param_test.c          | 1285 ++++++++++++++++++++
 tools/testing/selftests/rseq/rseq-arm.h            |  568 +++++++++
 tools/testing/selftests/rseq/rseq-ppc.h            |  567 +++++++++
 tools/testing/selftests/rseq/rseq-x86.h            |  898 ++++++++++++++
 tools/testing/selftests/rseq/rseq.c                |  116 ++
 tools/testing/selftests/rseq/rseq.h                |  154 +++
 tools/testing/selftests/rseq/run_param_test.sh     |  124 ++
 45 files changed, 7540 insertions(+), 1 deletion(-)
 create mode 100644 include/trace/events/rseq.h
 create mode 100644 include/uapi/linux/cpu_opv.h
 create mode 100644 include/uapi/linux/rseq.h
 create mode 100644 kernel/cpu_opv.c
 create mode 100644 kernel/rseq.c
 create mode 100644 tools/testing/selftests/cpu-opv/.gitignore
 create mode 100644 tools/testing/selftests/cpu-opv/Makefile
 create mode 100644 tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
 create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.c
 create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.h
 create mode 100644 tools/testing/selftests/rseq/.gitignore
 create mode 100644 tools/testing/selftests/rseq/Makefile
 create mode 100644 tools/testing/selftests/rseq/basic_percpu_ops_test.c
 create mode 100644 tools/testing/selftests/rseq/basic_test.c
 create mode 100644 tools/testing/selftests/rseq/param_test.c
 create mode 100644 tools/testing/selftests/rseq/rseq-arm.h
 create mode 100644 tools/testing/selftests/rseq/rseq-ppc.h
 create mode 100644 tools/testing/selftests/rseq/rseq-x86.h
 create mode 100644 tools/testing/selftests/rseq/rseq.c
 create mode 100644 tools/testing/selftests/rseq/rseq.h
 create mode 100755 tools/testing/selftests/rseq/run_param_test.sh

-- 
2.11.0

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [RFC PATCH for 4.15 00/14] Restartable sequences and CPU op vector v10
@ 2017-11-06 20:56 ` Mathieu Desnoyers
  0 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2017-11-06 20:56 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers

Here is an updated rseq patchset taking into account the feedback
received at kernel summit and afterwards.

Use-cases explanation and benchmarks can be found in patch 01
"Restartable sequences system call".

This is still submitted as RFC. I'm keeping a linux-rseq
tree with this patchset at:

https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/tag/?h=v4.14-rc8-rseq-20171106.1

If everyone is happy with it, I plan to submit this officially
when the merge window opens.

Thanks,

Mathieu

Boqun Feng (2):
  Restartable sequences: powerpc architecture support
  Restartable sequences: Wire up powerpc system call

Mathieu Desnoyers (12):
  Restartable sequences system call (v10)
  Restartable sequences: ARM 32 architecture support
  Restartable sequences: wire up ARM 32 system call
  Restartable sequences: x86 32/64 architecture support
  Restartable sequences: wire up x86 32/64 system call
  Provide cpu_opv system call (v2)
  cpu_opv: Wire up x86 32/64 system call
  cpu_opv: Wire up powerpc system call
  cpu_opv: Wire up ARM32 system call
  cpu_opv: Implement selftests (v2)
  Restartable sequences: Provide self-tests (v2)
  Restartable sequences selftests: arm: workaround gcc asm size guess

 MAINTAINERS                                        |   20 +
 arch/Kconfig                                       |    7 +
 arch/arm/Kconfig                                   |    1 +
 arch/arm/kernel/signal.c                           |    7 +
 arch/arm/tools/syscall.tbl                         |    2 +
 arch/powerpc/Kconfig                               |    1 +
 arch/powerpc/include/asm/systbl.h                  |    2 +
 arch/powerpc/include/asm/unistd.h                  |    2 +-
 arch/powerpc/include/uapi/asm/unistd.h             |    2 +
 arch/powerpc/kernel/signal.c                       |    3 +
 arch/x86/Kconfig                                   |    1 +
 arch/x86/entry/common.c                            |    1 +
 arch/x86/entry/syscalls/syscall_32.tbl             |    2 +
 arch/x86/entry/syscalls/syscall_64.tbl             |    2 +
 arch/x86/kernel/signal.c                           |    6 +
 fs/exec.c                                          |    1 +
 include/linux/sched.h                              |   89 ++
 include/trace/events/rseq.h                        |   60 +
 include/uapi/linux/cpu_opv.h                       |  117 ++
 include/uapi/linux/rseq.h                          |  138 +++
 init/Kconfig                                       |   28 +
 kernel/Makefile                                    |    2 +
 kernel/cpu_opv.c                                   |  952 +++++++++++++++
 kernel/fork.c                                      |    2 +
 kernel/rseq.c                                      |  329 +++++
 kernel/sched/core.c                                |   41 +
 kernel/sched/sched.h                               |    2 +
 kernel/sys_ni.c                                    |    4 +
 tools/testing/selftests/Makefile                   |    2 +
 tools/testing/selftests/cpu-opv/.gitignore         |    1 +
 tools/testing/selftests/cpu-opv/Makefile           |   15 +
 .../testing/selftests/cpu-opv/basic_cpu_opv_test.c | 1157 ++++++++++++++++++
 tools/testing/selftests/cpu-opv/cpu-op.c           |  348 ++++++
 tools/testing/selftests/cpu-opv/cpu-op.h           |   68 ++
 tools/testing/selftests/rseq/.gitignore            |    4 +
 tools/testing/selftests/rseq/Makefile              |   22 +
 .../testing/selftests/rseq/basic_percpu_ops_test.c |  333 +++++
 tools/testing/selftests/rseq/basic_test.c          |   55 +
 tools/testing/selftests/rseq/param_test.c          | 1285 ++++++++++++++++++++
 tools/testing/selftests/rseq/rseq-arm.h            |  568 +++++++++
 tools/testing/selftests/rseq/rseq-ppc.h            |  567 +++++++++
 tools/testing/selftests/rseq/rseq-x86.h            |  898 ++++++++++++++
 tools/testing/selftests/rseq/rseq.c                |  116 ++
 tools/testing/selftests/rseq/rseq.h                |  154 +++
 tools/testing/selftests/rseq/run_param_test.sh     |  124 ++
 45 files changed, 7540 insertions(+), 1 deletion(-)
 create mode 100644 include/trace/events/rseq.h
 create mode 100644 include/uapi/linux/cpu_opv.h
 create mode 100644 include/uapi/linux/rseq.h
 create mode 100644 kernel/cpu_opv.c
 create mode 100644 kernel/rseq.c
 create mode 100644 tools/testing/selftests/cpu-opv/.gitignore
 create mode 100644 tools/testing/selftests/cpu-opv/Makefile
 create mode 100644 tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
 create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.c
 create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.h
 create mode 100644 tools/testing/selftests/rseq/.gitignore
 create mode 100644 tools/testing/selftests/rseq/Makefile
 create mode 100644 tools/testing/selftests/rseq/basic_percpu_ops_test.c
 create mode 100644 tools/testing/selftests/rseq/basic_test.c
 create mode 100644 tools/testing/selftests/rseq/param_test.c
 create mode 100644 tools/testing/selftests/rseq/rseq-arm.h
 create mode 100644 tools/testing/selftests/rseq/rseq-ppc.h
 create mode 100644 tools/testing/selftests/rseq/rseq-x86.h
 create mode 100644 tools/testing/selftests/rseq/rseq.c
 create mode 100644 tools/testing/selftests/rseq/rseq.h
 create mode 100755 tools/testing/selftests/rseq/run_param_test.sh

-- 
2.11.0

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [RFC PATCH v10 for 4.15 01/14] Restartable sequences system call
  2017-11-06 20:56 ` Mathieu Desnoyers
  (?)
@ 2017-11-06 20:56 ` Mathieu Desnoyers
  2017-11-07  1:24     ` Boqun Feng
  -1 siblings, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2017-11-06 20:56 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers, Alexander Viro

Expose a new system call allowing each thread to register one userspace
memory area to be used as an ABI between kernel and user-space for two
purposes: user-space restartable sequences and quick access to read the
current CPU number value from user-space.

* Restartable sequences (per-cpu atomics)

Restartables sequences allow user-space to perform update operations on
per-cpu data without requiring heavy-weight atomic operations.

The restartable critical sections (percpu atomics) work has been started
by Paul Turner and Andrew Hunter. It lets the kernel handle restart of
critical sections. [1] [2] The re-implementation proposed here brings a
few simplifications to the ABI which facilitates porting to other
architectures and speeds up the user-space fast path. A second system
call, cpu_opv(), is proposed as fallback to deal with debugger
single-stepping. cpu_opv() executes a sequence of operations on behalf
of user-space with preemption disabled.

Here are benchmarks of various rseq use-cases.

Test hardware:

arm32: ARMv7 Processor rev 4 (v7l) "Cubietruck", 2-core
x86-64: Intel E5-2630 v3@2.40GHz, 16-core, hyperthreading

The following benchmarks were all performed on a single thread.

* Per-CPU statistic counter increment

                getcpu+atomic (ns/op)    rseq (ns/op)    speedup
arm32:                344.0                 31.4          11.0
x86-64:                15.3                  2.0           7.7

* LTTng-UST: write event 32-bit header, 32-bit payload into tracer
             per-cpu buffer

                getcpu+atomic (ns/op)    rseq (ns/op)    speedup
arm32:               2502.0                 2250.0         1.1
x86-64:               117.4                   98.0         1.2

* liburcu percpu: lock-unlock pair, dereference, read/compare word

                getcpu+atomic (ns/op)    rseq (ns/op)    speedup
arm32:                751.0                 128.5          5.8
x86-64:                53.4                  28.6          1.9

* jemalloc memory allocator adapted to use rseq

Using rseq with per-cpu memory pools in jemalloc at Facebook (based on
rseq 2016 implementation):

The production workload response-time has 1-2% gain avg. latency, and
the P99 overall latency drops by 2-3%.

* Reading the current CPU number

Speeding up reading the current CPU number on which the caller thread is
running is done by keeping the current CPU number up do date within the
cpu_id field of the memory area registered by the thread. This is done
by making scheduler preemption set the TIF_NOTIFY_RESUME flag on the
current thread. Upon return to user-space, a notify-resume handler
updates the current CPU value within the registered user-space memory
area. User-space can then read the current CPU number directly from
memory.

Keeping the current cpu id in a memory area shared between kernel and
user-space is an improvement over current mechanisms available to read
the current CPU number, which has the following benefits over
alternative approaches:

- 35x speedup on ARM vs system call through glibc
- 20x speedup on x86 compared to calling glibc, which calls vdso
  executing a "lsl" instruction,
- 14x speedup on x86 compared to inlined "lsl" instruction,
- Unlike vdso approaches, this cpu_id value can be read from an inline
  assembly, which makes it a useful building block for restartable
  sequences.
- The approach of reading the cpu id through memory mapping shared
  between kernel and user-space is portable (e.g. ARM), which is not the
  case for the lsl-based x86 vdso.

On x86, yet another possible approach would be to use the gs segment
selector to point to user-space per-cpu data. This approach performs
similarly to the cpu id cache, but it has two disadvantages: it is
not portable, and it is incompatible with existing applications already
using the gs segment selector for other purposes.

Benchmarking various approaches for reading the current CPU number:

ARMv7 Processor rev 4 (v7l)
Machine model: Cubietruck
- Baseline (empty loop):                                    8.4 ns
- Read CPU from rseq cpu_id:                               16.7 ns
- Read CPU from rseq cpu_id (lazy register):               19.8 ns
- glibc 2.19-0ubuntu6.6 getcpu:                           301.8 ns
- getcpu system call:                                     234.9 ns

x86-64 Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz:
- Baseline (empty loop):                                    0.8 ns
- Read CPU from rseq cpu_id:                                0.8 ns
- Read CPU from rseq cpu_id (lazy register):                0.8 ns
- Read using gs segment selector:                           0.8 ns
- "lsl" inline assembly:                                   13.0 ns
- glibc 2.19-0ubuntu6 getcpu:                              16.6 ns
- getcpu system call:                                      53.9 ns

- Speed (benchmark taken on v8 of patchset)

Running 10 runs of hackbench -l 100000 seems to indicate, contrary to
expectations, that enabling CONFIG_RSEQ slightly accelerates the
scheduler:

Configuration: 2 sockets * 8-core Intel(R) Xeon(R) CPU E5-2630 v3 @
2.40GHz (directly on hardware, hyperthreading disabled in BIOS, energy
saving disabled in BIOS, turboboost disabled in BIOS, cpuidle.off=1
kernel parameter), with a Linux v4.6 defconfig+localyesconfig,
restartable sequences series applied.

* CONFIG_RSEQ=n

avg.:      41.37 s
std.dev.:   0.36 s

* CONFIG_RSEQ=y

avg.:      40.46 s
std.dev.:   0.33 s

- Size

On x86-64, between CONFIG_RSEQ=n/y, the text size increase of vmlinux is
567 bytes, and the data size increase of vmlinux is 5696 bytes.

On x86-64, between CONFIG_CPU_OPV=n/y, the text size increase of vmlinux is
5576 bytes, and the data size increase of vmlinux is 6164 bytes.

[1] https://lwn.net/Articles/650333/
[2] http://www.linuxplumbersconf.org/2013/ocw/system/presentations/1695/original/LPC%20-%20PerCpu%20Atomics.pdf

Link: http://lkml.kernel.org/r/20151027235635.16059.11630.stgit@pjt-glaptop.roam.corp.google.com
Link: http://lkml.kernel.org/r/20150624222609.6116.86035.stgit@kitami.mtv.corp.google.com
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Paul Turner <pjt@google.com>
CC: Andrew Hunter <ahh@google.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Michael Kerrisk <mtk.manpages@gmail.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: linux-api@vger.kernel.org
---

Changes since v1:
- Return -1, errno=EINVAL if cpu_cache pointer is not aligned on
  sizeof(int32_t).
- Update man page to describe the pointer alignement requirements and
  update atomicity guarantees.
- Add MAINTAINERS file GETCPU_CACHE entry.
- Remove dynamic memory allocation: go back to having a single
  getcpu_cache entry per thread. Update documentation accordingly.
- Rebased on Linux 4.4.

Changes since v2:
- Introduce a "cmd" argument, along with an enum with GETCPU_CACHE_GET
  and GETCPU_CACHE_SET. Introduce a uapi header linux/getcpu_cache.h
  defining this enumeration.
- Split resume notifier architecture implementation from the system call
  wire up in the following arch-specific patches.
- Man pages updates.
- Handle 32-bit compat pointers.
- Simplify handling of getcpu_cache GETCPU_CACHE_SET compiler barrier:
  set the current cpu cache pointer before doing the cache update, and
  set it back to NULL if the update fails. Setting it back to NULL on
  error ensures that no resume notifier will trigger a SIGSEGV if a
  migration happened concurrently.

Changes since v3:
- Fix __user annotations in compat code,
- Update memory ordering comments.
- Rebased on kernel v4.5-rc5.

Changes since v4:
- Inline getcpu_cache_fork, getcpu_cache_execve, and getcpu_cache_exit.
- Add new line between if() and switch() to improve readability.
- Added sched switch benchmarks (hackbench) and size overhead comparison
  to change log.

Changes since v5:
- Rename "getcpu_cache" to "thread_local_abi", allowing to extend
  this system call to cover future features such as restartable critical
  sections. Generalizing this system call ensures that we can add
  features similar to the cpu_id field within the same cache-line
  without having to track one pointer per feature within the task
  struct.
- Add a tlabi_nr parameter to the system call, thus allowing to extend
  the ABI beyond the initial 64-byte structure by registering structures
  with tlabi_nr greater than 0. The initial ABI structure is associated
  with tlabi_nr 0.
- Rebased on kernel v4.5.

Changes since v6:
- Integrate "restartable sequences" v2 patchset from Paul Turner.
- Add handling of single-stepping purely in user-space, with a
  fallback to locking after 2 rseq failures to ensure progress, and
  by exposing a __rseq_table section to debuggers so they know where
  to put breakpoints when dealing with rseq assembly blocks which
  can be aborted at any point.
- make the code and ABI generic: porting the kernel implementation
  simply requires to wire up the signal handler and return to user-space
  hooks, and allocate the syscall number.
- extend testing with a fully configurable test program. See
  param_spinlock_test -h for details.
- handling of rseq ENOSYS in user-space, also with a fallback
  to locking.
- modify Paul Turner's rseq ABI to only require a single TLS store on
  the user-space fast-path, removing the need to populate two additional
  registers. This is made possible by introducing struct rseq_cs into
  the ABI to describe a critical section start_ip, post_commit_ip, and
  abort_ip.
- Rebased on kernel v4.7-rc7.

Changes since v7:
- Documentation updates.
- Integrated powerpc architecture support.
- Compare rseq critical section start_ip, allows shriking the user-space
  fast-path code size.
- Added Peter Zijlstra, Paul E. McKenney and Boqun Feng as
  co-maintainers.
- Added do_rseq2 and do_rseq_memcpy to test program helper library.
- Code cleanup based on review from Peter Zijlstra, Andy Lutomirski and
  Boqun Feng.
- Rebase on kernel v4.8-rc2.

Changes since v8:
- clear rseq_cs even if non-nested. Speeds up user-space fast path by
  removing the final "rseq_cs=NULL" assignment.
- add enum rseq_flags: critical sections and threads can set migration,
  preemption and signal "disable" flags to inhibit rseq behavior.
- rseq_event_counter needs to be updated with a pre-increment: Otherwise
  misses an increment after exec (when TLS and in-kernel states are
  initially 0).

Changes since v9:
- Update changelog.
- Fold instrumentation patch.
- check abort-ip signature: Add a signature before the abort-ip landing
  address. This signature is also received as a new parameter to the
  rseq system call. The kernel uses it ensures that rseq cannot be used
  as an exploit vector to redirect execution to arbitrary code.
- Use rseq pointer for both register and unregister. This is more
  symmetric, and eventually allow supporting a linked list of rseq
  struct per thread if needed in the future.
- Unregistration of a rseq structure is now done with
  RSEQ_FLAG_UNREGISTER.
- Remove reference counting. Return "EBUSY" to the caller if rseq is
  already registered for the current thread. This simplifies
  implementation while still allowing user-space to perform lazy
  registration in multi-lib use-cases. (suggested by Ben Maurer)
- Clear rseq_cs upon unregister.
- Set cpu_id back to -1 on unregister, so if rseq user libraries follow
  an unregister, and they expect to lazily register rseq, they can do
  so.
- Document rseq_cs clear requirement: JIT should reset the rseq_cs
  pointer before reclaiming memory of rseq_cs structure.
- Introduce rseq_len syscall parameter, rseq_cs version field:
  Allow keeping track of the registered rseq struct length, for future
  extensions. Add rseq_cs version as first field. Will allow future
  extensions.
- Use offset and unsigned arithmetic to save a branch:  Save a
  conditional branch when comparing instruction pointer against a
  rseq_cs descriptor's address range by having post_commit_ip as an
  offset from start_ip, and using unsigned integer comparison.
  Suggested by Ben Maurer.
- Remove event counter from ABI. Suggested by Andy Lutomirski.
- Add INIT_ONSTACK macro: Introduce the
  RSEQ_FIELD_u32_u64_INIT_ONSTACK() macros to ensure that users
  correctly initialize the upper bits of RSEQ_FIELD_u32_u64() on their
  stack to 0 on 32-bit architectures.
- Select MEMBARRIER: Allows user-space rseq fast-paths to use the value
  of cpu_id field (inherently required by the rseq algorithm) to figure
  out whether membarrier can be expected to be available.
  This effectively allows user-space fast-paths to remove extra
  comparisons and branch testing whether membarrier is enabled, and thus
  whether a full barrier is required (e.g. in userspace RCU
  implementation after rcu_read_lock/before rcu_read_unlock).
- Expose cpu_id_start field: Checking whether the (cpu_id < 0) in the C
  preparation part of the rseq fast-path brings significant overhead at
  least on arm32. We can remove this extra comparison by exposing two
  distinct cpu_id fields in the rseq TLS:

  The field cpu_id_start always contain a *possible* cpu number, although
  it may not be the current one if, for instance, rseq is not initialized
  for the current thread. cpu_id_start is meant to be used in the C code
  for the pointer chasing to figure out which per-cpu data structure
  should be passed to the rseq asm sequence.

  The field cpu_id values -1 means rseq is not initialized, and -2 means
  initialization failed. That field is used in the rseq asm sequence to
  confirm that the cpu_id_start value was indeed the current cpu number.
  It also ends up confirming that rseq is initialized for the current
  thread, because values -1 and -2 will never match the cpu_id_start
  possible cpu number values.

  This allows checking the current CPU number and rseq initialization
  state with a single comparison on the fast-path.

Man page associated:

RSEQ(2)                Linux Programmer's Manual               RSEQ(2)

NAME
       rseq - Restartable sequences and cpu number cache

SYNOPSIS
       #include <linux/rseq.h>

       int rseq(struct rseq * rseq, uint32_t rseq_len, int flags, uint32_t sig);

DESCRIPTION
       The  rseq()  ABI  accelerates  user-space operations on per-cpu
       data by defining a shared data structure ABI between each user-
       space thread and the kernel.

       It  allows  user-space  to perform update operations on per-cpu
       data without requiring heavy-weight atomic operations.

       Restartable sequences are atomic  with  respect  to  preemption
       (making  it atomic with respect to other threads running on the
       same CPU), as well as  signal  delivery  (user-space  execution
       contexts nested over the same thread).

       It is suited for update operations on per-cpu data.

       It can be used on data structures shared between threads within
       a process, and on data structures shared between threads across
       different processes.

       Some examples of operations that can be accelerated or improved
       by this ABI:

       · Memory allocator per-cpu free-lists,

       · Querying the current CPU number,

       · Incrementing per-CPU counters,

       · Modifying data protected by per-CPU spinlocks,

       · Inserting/removing elements in per-CPU linked-lists,

       · Writing/reading per-CPU ring buffers content.

       · Accurately reading performance monitoring unit counters  with
         respect to thread migration.

       The  rseq argument is a pointer to the thread-local rseq struc‐
       ture to be shared between kernel and user-space.  A  NULL  rseq
       value unregisters the current thread rseq structure.

       The layout of struct rseq is as follows:

       Structure alignment
              This structure is aligned on multiples of 32 bytes.

       Structure size
              This  structure  is  extensible.  Its  size is passed as
              parameter to the rseq system call.

       Fields

           cpu_id_start
              Optimistic cache of the CPU number on which the  current
              thread  is running. Its value is guaranteed to always be
              a possible CPU number, even when rseq  is  not  initial‐
              ized.  The  value it contains should always be confirmed
              by reading the cpu_id field.

           cpu_id
              Cache of the CPU number on which the current  thread  is
              running.  -1 if uninitialized.

           rseq_cs
              The  rseq_cs  field is a pointer to a struct rseq_cs. Is
              is NULL when no rseq assembly block critical section  is
              active for the current thread.  Setting it to point to a
              critical section descriptor (struct rseq_cs)  marks  the
              beginning of the critical section.

           flags
              Flags  indicating  the  restart behavior for the current
              thread. This is mainly used for debugging purposes.  Can
              be either:

       ·      RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT

       ·      RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL

       ·      RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE

       The layout of struct rseq_cs version 0 is as follows:

       Structure alignment
              This structure is aligned on multiples of 32 bytes.

       Structure size
              This structure has a fixed size of 32 bytes.

       Fields

           version
              Version of this structure.

           flags
              Flags indicating the restart behavior of this structure.
              Can be either:

       ·      RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT

       ·      RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL

       ·      RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE

           start_ip
              Instruction pointer address of the first instruction  of
              the sequence of consecutive assembly instructions.

           post_commit_offset
              Offset  (from start_ip address) of the address after the
              last instruction of the sequence of consecutive assembly
              instructions.

           abort_ip
              Instruction  pointer address where to move the execution
              flow in case of abort of  the  sequence  of  consecutive
              assembly instructions.

       The  rseq_len argument is the size of the struct rseq to regis‐
       ter.

       The flags argument is 0 for registration, and  RSEQ_FLAG_UNREG‐
       ISTER for unregistration.

       The  sig argument is the 32-bit signature to be expected before
       the abort handler code.

       A single library per process should keep the rseq structure  in
       a  thread-local  storage  variable.  The cpu_id field should be
       initialized to -1, and the cpu_id_start field  should  be  ini‐
       tialized to a possible CPU value (typically 0).

       Each  thread  is  responsible for registering and unregistering
       its rseq structure. No more than one rseq structure address can
       be registered per thread at a given time.

       In  a  typical  usage scenario, the thread registering the rseq
       structure will be performing  loads  and  stores  from/to  that
       structure.  It  is  however also allowed to read that structure
       from other threads.  The rseq field updates  performed  by  the
       kernel  provide  relaxed  atomicity  semantics, which guarantee
       that other threads performing relaxed atomic reads of  the  cpu
       number cache will always observe a consistent value.

RETURN VALUE
       A  return  value  of  0  indicates  success.  On  error,  -1 is
       returned, and errno is set appropriately.

ERRORS
       EINVAL Either flags contains an invalid value, or rseq contains
              an  address  which  is  not  appropriately  aligned,  or
              rseq_len contains a size that does not  match  the  size
              received on registration.

       ENOSYS The  rseq()  system call is not implemented by this ker‐
              nel.

       EFAULT rseq is an invalid address.

       EBUSY  Restartable sequence  is  already  registered  for  this
              thread.

       EPERM  The  sig  argument  on unregistration does not match the
              signature received on registration.

VERSIONS
       The rseq() system call was added in Linux 4.X (TODO).

CONFORMING TO
       rseq() is Linux-specific.

SEE ALSO
       sched_getcpu(3)

Linux                         2017-11-06                       RSEQ(2)
---
 MAINTAINERS                 |  11 ++
 arch/Kconfig                |   7 +
 fs/exec.c                   |   1 +
 include/linux/sched.h       |  89 ++++++++++++
 include/trace/events/rseq.h |  60 ++++++++
 include/uapi/linux/rseq.h   | 138 +++++++++++++++++++
 init/Kconfig                |  14 ++
 kernel/Makefile             |   1 +
 kernel/fork.c               |   2 +
 kernel/rseq.c               | 329 ++++++++++++++++++++++++++++++++++++++++++++
 kernel/sched/core.c         |   4 +
 kernel/sys_ni.c             |   3 +
 12 files changed, 659 insertions(+)
 create mode 100644 include/trace/events/rseq.h
 create mode 100644 include/uapi/linux/rseq.h
 create mode 100644 kernel/rseq.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 2f4e462aa4a2..353366928ae8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11484,6 +11484,17 @@ F:	include/dt-bindings/reset/
 F:	include/linux/reset.h
 F:	include/linux/reset-controller.h
 
+RESTARTABLE SEQUENCES SUPPORT
+M:	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+M:	Peter Zijlstra <peterz@infradead.org>
+M:	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
+M:	Boqun Feng <boqun.feng@gmail.com>
+L:	linux-kernel@vger.kernel.org
+S:	Supported
+F:	kernel/rseq.c
+F:	include/uapi/linux/rseq.h
+F:	include/trace/events/rseq.h
+
 RFKILL
 M:	Johannes Berg <johannes@sipsolutions.net>
 L:	linux-wireless@vger.kernel.org
diff --git a/arch/Kconfig b/arch/Kconfig
index 057370a0ac4e..b5e7f977fc29 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -258,6 +258,13 @@ config HAVE_REGS_AND_STACK_ACCESS_API
 	  declared in asm/ptrace.h
 	  For example the kprobes-based event tracer needs this API.
 
+config HAVE_RSEQ
+	bool
+	depends on HAVE_REGS_AND_STACK_ACCESS_API
+	help
+	  This symbol should be selected by an architecture if it
+	  supports an implementation of restartable sequences.
+
 config HAVE_CLK
 	bool
 	help
diff --git a/fs/exec.c b/fs/exec.c
index 3e14ba25f678..3faf8ff0fc6d 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1803,6 +1803,7 @@ static int do_execveat_common(int fd, struct filename *filename,
 	current->fs->in_exec = 0;
 	current->in_execve = 0;
 	membarrier_execve(current);
+	rseq_execve(current);
 	acct_update_integrals(current);
 	task_numa_free(current);
 	free_bprm(bprm);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index fdf74f27acf1..b995a3b5bfc4 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -27,6 +27,7 @@
 #include <linux/signal_types.h>
 #include <linux/mm_types_task.h>
 #include <linux/task_io_accounting.h>
+#include <linux/rseq.h>
 
 /* task_struct member predeclarations (sorted alphabetically): */
 struct audit_context;
@@ -977,6 +978,13 @@ struct task_struct {
 	unsigned long			numa_pages_migrated;
 #endif /* CONFIG_NUMA_BALANCING */
 
+#ifdef CONFIG_RSEQ
+	struct rseq __user *rseq;
+	u32 rseq_len;
+	u32 rseq_sig;
+	bool rseq_preempt, rseq_signal, rseq_migrate;
+#endif
+
 	struct tlbflush_unmap_batch	tlb_ubc;
 
 	struct rcu_head			rcu;
@@ -1667,4 +1675,85 @@ extern long sched_getaffinity(pid_t pid, struct cpumask *mask);
 #define TASK_SIZE_OF(tsk)	TASK_SIZE
 #endif
 
+#ifdef CONFIG_RSEQ
+static inline void rseq_set_notify_resume(struct task_struct *t)
+{
+	if (t->rseq)
+		set_tsk_thread_flag(t, TIF_NOTIFY_RESUME);
+}
+void __rseq_handle_notify_resume(struct pt_regs *regs);
+static inline void rseq_handle_notify_resume(struct pt_regs *regs)
+{
+	if (current->rseq)
+		__rseq_handle_notify_resume(regs);
+}
+/*
+ * If parent process has a registered restartable sequences area, the
+ * child inherits. Only applies when forking a process, not a thread. In
+ * case a parent fork() in the middle of a restartable sequence, set the
+ * resume notifier to force the child to retry.
+ */
+static inline void rseq_fork(struct task_struct *t, unsigned long clone_flags)
+{
+	if (clone_flags & CLONE_THREAD) {
+		t->rseq = NULL;
+		t->rseq_len = 0;
+		t->rseq_sig = 0;
+	} else {
+		t->rseq = current->rseq;
+		t->rseq_len = current->rseq_len;
+		t->rseq_sig = current->rseq_sig;
+		rseq_set_notify_resume(t);
+	}
+}
+static inline void rseq_execve(struct task_struct *t)
+{
+	t->rseq = NULL;
+	t->rseq_len = 0;
+	t->rseq_sig = 0;
+}
+static inline void rseq_sched_out(struct task_struct *t)
+{
+	rseq_set_notify_resume(t);
+}
+static inline void rseq_signal_deliver(struct pt_regs *regs)
+{
+	current->rseq_signal = true;
+	rseq_handle_notify_resume(regs);
+}
+static inline void rseq_preempt(struct task_struct *t)
+{
+	t->rseq_preempt = true;
+}
+static inline void rseq_migrate(struct task_struct *t)
+{
+	t->rseq_migrate = true;
+}
+#else
+static inline void rseq_set_notify_resume(struct task_struct *t)
+{
+}
+static inline void rseq_handle_notify_resume(struct pt_regs *regs)
+{
+}
+static inline void rseq_fork(struct task_struct *t, unsigned long clone_flags)
+{
+}
+static inline void rseq_execve(struct task_struct *t)
+{
+}
+static inline void rseq_sched_out(struct task_struct *t)
+{
+}
+static inline void rseq_signal_deliver(struct pt_regs *regs)
+{
+}
+static inline void rseq_preempt(struct task_struct *t)
+{
+}
+static inline void rseq_migrate(struct task_struct *t)
+{
+}
+#endif
+
 #endif
diff --git a/include/trace/events/rseq.h b/include/trace/events/rseq.h
new file mode 100644
index 000000000000..4d30d77c86b4
--- /dev/null
+++ b/include/trace/events/rseq.h
@@ -0,0 +1,60 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM rseq
+
+#if !defined(_TRACE_RSEQ_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_RSEQ_H
+
+#include <linux/tracepoint.h>
+#include <linux/types.h>
+
+TRACE_EVENT(rseq_update,
+
+	TP_PROTO(struct task_struct *t),
+
+	TP_ARGS(t),
+
+	TP_STRUCT__entry(
+		__field(s32, cpu_id)
+	),
+
+	TP_fast_assign(
+		__entry->cpu_id = raw_smp_processor_id();
+	),
+
+	TP_printk("cpu_id=%d", __entry->cpu_id)
+);
+
+TRACE_EVENT(rseq_ip_fixup,
+
+	TP_PROTO(void __user *regs_ip, void __user *start_ip,
+		unsigned long post_commit_offset, void __user *abort_ip,
+		int ret),
+
+	TP_ARGS(regs_ip, start_ip, post_commit_offset, abort_ip, ret),
+
+	TP_STRUCT__entry(
+		__field(void __user *, regs_ip)
+		__field(void __user *, start_ip)
+		__field(unsigned long, post_commit_offset)
+		__field(void __user *, abort_ip)
+		__field(int, ret)
+	),
+
+	TP_fast_assign(
+		__entry->regs_ip = regs_ip;
+		__entry->start_ip = start_ip;
+		__entry->post_commit_offset = post_commit_offset;
+		__entry->abort_ip = abort_ip;
+		__entry->ret = ret;
+	),
+
+	TP_printk("regs_ip=%p start_ip=%p post_commit_offset=%lu abort_ip=%p ret=%d",
+		__entry->regs_ip, __entry->start_ip,
+		__entry->post_commit_offset, __entry->abort_ip,
+		__entry->ret)
+);
+
+#endif /* _TRACE_SOCK_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/include/uapi/linux/rseq.h b/include/uapi/linux/rseq.h
new file mode 100644
index 000000000000..28ee2ebd3dae
--- /dev/null
+++ b/include/uapi/linux/rseq.h
@@ -0,0 +1,138 @@
+#ifndef _UAPI_LINUX_RSEQ_H
+#define _UAPI_LINUX_RSEQ_H
+
+/*
+ * linux/rseq.h
+ *
+ * Restartable sequences system call API
+ *
+ * Copyright (c) 2015-2016 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifdef __KERNEL__
+# include <linux/types.h>
+#else	/* #ifdef __KERNEL__ */
+# include <stdint.h>
+#endif	/* #else #ifdef __KERNEL__ */
+
+#include <asm/byteorder.h>
+
+#ifdef __LP64__
+# define RSEQ_FIELD_u32_u64(field)			uint64_t field
+# define RSEQ_FIELD_u32_u64_INIT_ONSTACK(field, v)	field = (intptr_t)v
+#elif defined(__BYTE_ORDER) ? \
+	__BYTE_ORDER == __BIG_ENDIAN : defined(__BIG_ENDIAN)
+# define RSEQ_FIELD_u32_u64(field)	uint32_t field ## _padding, field
+# define RSEQ_FIELD_u32_u64_INIT_ONSTACK(field, v)	\
+	field ## _padding = 0, field = (intptr_t)v
+#else
+# define RSEQ_FIELD_u32_u64(field)	uint32_t field, field ## _padding
+# define RSEQ_FIELD_u32_u64_INIT_ONSTACK(field, v)	\
+	field = (intptr_t)v, field ## _padding = 0
+#endif
+
+enum rseq_flags {
+	RSEQ_FLAG_UNREGISTER = (1 << 0),
+};
+
+enum rseq_cs_flags {
+	RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT	= (1U << 0),
+	RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL	= (1U << 1),
+	RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE	= (1U << 2),
+};
+
+/*
+ * struct rseq_cs is aligned on 4 * 8 bytes to ensure it is always
+ * contained within a single cache-line. It is usually declared as
+ * link-time constant data.
+ */
+struct rseq_cs {
+	uint32_t version;	/* Version of this structure. */
+	uint32_t flags;		/* enum rseq_cs_flags */
+	RSEQ_FIELD_u32_u64(start_ip);
+	RSEQ_FIELD_u32_u64(post_commit_offset);	/* From start_ip */
+	RSEQ_FIELD_u32_u64(abort_ip);
+} __attribute__((aligned(4 * sizeof(uint64_t))));
+
+/*
+ * struct rseq is aligned on 4 * 8 bytes to ensure it is always
+ * contained within a single cache-line.
+ *
+ * A single struct rseq per thread is allowed.
+ */
+struct rseq {
+	/*
+	 * Restartable sequences cpu_id_start field. Updated by the
+	 * kernel, and read by user-space with single-copy atomicity
+	 * semantics. Aligned on 32-bit. Always contain a value in the
+	 * range of possible CPUs, although the value may not be the
+	 * actual current CPU (e.g. if rseq is not initialized). This
+	 * CPU number value should always be confirmed against the value
+	 * of the cpu_id field.
+	 */
+	uint32_t cpu_id_start;
+	/*
+	 * Restartable sequences cpu_id field. Updated by the kernel,
+	 * and read by user-space with single-copy atomicity semantics.
+	 * Aligned on 32-bit. Values -1U and -2U have a special
+	 * semantic: -1U means "rseq uninitialized", and -2U means "rseq
+	 * initialization failed".
+	 */
+	uint32_t cpu_id;
+	/*
+	 * Restartable sequences rseq_cs field.
+	 *
+	 * Contains NULL when no critical section is active for the current
+	 * thread, or holds a pointer to the currently active struct rseq_cs.
+	 *
+	 * Updated by user-space at the beginning of assembly instruction
+	 * sequence block, and by the kernel when it restarts an assembly
+	 * instruction sequence block, and when the kernel detects that it
+	 * is preempting or delivering a signal outside of the range
+	 * targeted by the rseq_cs. Also needs to be cleared by user-space
+	 * before reclaiming memory that contains the targeted struct
+	 * rseq_cs.
+	 *
+	 * Read and set by the kernel with single-copy atomicity semantics.
+	 * Aligned on 64-bit.
+	 */
+	RSEQ_FIELD_u32_u64(rseq_cs);
+	/*
+	 * - RSEQ_DISABLE flag:
+	 *
+	 * Fallback fast-track flag for single-stepping.
+	 * Set by user-space if lack of progress is detected.
+	 * Cleared by user-space after rseq finish.
+	 * Read by the kernel.
+	 * - RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT
+	 *     Inhibit instruction sequence block restart and event
+	 *     counter increment on preemption for this thread.
+	 * - RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL
+	 *     Inhibit instruction sequence block restart and event
+	 *     counter increment on signal delivery for this thread.
+	 * - RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE
+	 *     Inhibit instruction sequence block restart and event
+	 *     counter increment on migration for this thread.
+	 */
+	uint32_t flags;
+} __attribute__((aligned(4 * sizeof(uint64_t))));
+
+#endif /* _UAPI_LINUX_RSEQ_H */
diff --git a/init/Kconfig b/init/Kconfig
index 3c1faaa2af4a..cbedfb91b40a 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1400,6 +1400,20 @@ config MEMBARRIER
 
 	  If unsure, say Y.
 
+config RSEQ
+	bool "Enable rseq() system call" if EXPERT
+	default y
+	depends on HAVE_RSEQ
+	select MEMBARRIER
+	help
+	  Enable the restartable sequences system call. It provides a
+	  user-space cache for the current CPU number value, which
+	  speeds up getting the current CPU number from user-space,
+	  as well as an ABI to speed up user-space operations on
+	  per-CPU data.
+
+	  If unsure, say Y.
+
 config EMBEDDED
 	bool "Embedded system"
 	option allnoconfig_y
diff --git a/kernel/Makefile b/kernel/Makefile
index 172d151d429c..3574669dafd9 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -112,6 +112,7 @@ obj-$(CONFIG_CONTEXT_TRACKING) += context_tracking.o
 obj-$(CONFIG_TORTURE_TEST) += torture.o
 
 obj-$(CONFIG_HAS_IOMEM) += memremap.o
+obj-$(CONFIG_RSEQ) += rseq.o
 
 $(obj)/configs.o: $(obj)/config_data.h
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 07cc743698d3..1f3c25e28742 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1862,6 +1862,8 @@ static __latent_entropy struct task_struct *copy_process(
 	 */
 	copy_seccomp(p);
 
+	rseq_fork(p, clone_flags);
+
 	/*
 	 * Process group and session signals need to be delivered to just the
 	 * parent before the fork or both the parent and the child after the
diff --git a/kernel/rseq.c b/kernel/rseq.c
new file mode 100644
index 000000000000..d15d2ea9ab96
--- /dev/null
+++ b/kernel/rseq.c
@@ -0,0 +1,329 @@
+/*
+ * Restartable sequences system call
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Copyright (C) 2015, Google, Inc.,
+ * Paul Turner <pjt@google.com> and Andrew Hunter <ahh@google.com>
+ * Copyright (C) 2015-2016, EfficiOS Inc.,
+ * Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ */
+
+#include <linux/sched.h>
+#include <linux/uaccess.h>
+#include <linux/syscalls.h>
+#include <linux/rseq.h>
+#include <linux/types.h>
+#include <asm/ptrace.h>
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/rseq.h>
+
+/*
+ *
+ * Restartable sequences are a lightweight interface that allows
+ * user-level code to be executed atomically relative to scheduler
+ * preemption and signal delivery. Typically used for implementing
+ * per-cpu operations.
+ *
+ * It allows user-space to perform update operations on per-cpu data
+ * without requiring heavy-weight atomic operations.
+ *
+ * Detailed algorithm of rseq user-space assembly sequences:
+ *
+ *   Steps [1]-[3] (inclusive) need to be a sequence of instructions in
+ *   userspace that can handle being moved to the abort_ip between any
+ *   of those instructions.
+ *
+ *   The abort_ip address needs to be less than start_ip, or
+ *   greater-or-equal the post_commit_ip. Step [5] and the failure
+ *   code step [F1] need to be at addresses lesser than start_ip, or
+ *   greater-or-equal the post_commit_ip.
+ *
+ *       [start_ip]
+ *   1.  Userspace stores the address of the struct rseq_cs assembly
+ *       block descriptor into the rseq_cs field of the registered
+ *       struct rseq TLS area. This update is performed through a single
+ *       store, followed by a compiler barrier which prevents the
+ *       compiler from moving following loads or stores before this
+ *       store.
+ *
+ *   2.  Userspace tests to see whether the current cpu_id field
+ *       match the cpu number loaded before start_ip. Manually jumping
+ *       to [F1] in case of a mismatch.
+ *
+ *       Note that if we are preempted or interrupted by a signal
+ *       after [1] and before post_commit_ip, then the kernel
+ *       clears the rseq_cs field of struct rseq, then jumps us to
+ *       abort_ip.
+ *
+ *   3.  Userspace critical section final instruction before
+ *       post_commit_ip is the commit. The critical section is
+ *       self-terminating.
+ *       [post_commit_ip]
+ *
+ *   4.  success
+ *
+ *   On failure at [2]:
+ *
+ *       [abort_ip]
+ *   F1. goto failure label
+ */
+
+static bool rseq_update_cpu_id(struct task_struct *t)
+{
+	uint32_t cpu_id = raw_smp_processor_id();
+
+	if (__put_user(cpu_id, &t->rseq->cpu_id_start))
+		return false;
+	if (__put_user(cpu_id, &t->rseq->cpu_id))
+		return false;
+	trace_rseq_update(t);
+	return true;
+}
+
+static bool rseq_reset_rseq_cpu_id(struct task_struct *t)
+{
+	uint32_t cpu_id_start = 0, cpu_id = -1U;
+
+	/*
+	 * Reset cpu_id_start to its initial state (0).
+	 */
+	if (__put_user(cpu_id_start, &t->rseq->cpu_id_start))
+		return false;
+	/*
+	 * Reset cpu_id to -1U, so any user coming in after unregistration can
+	 * figure out that rseq needs to be registered again.
+	 */
+	if (__put_user(cpu_id, &t->rseq->cpu_id))
+		return false;
+	return true;
+}
+
+static bool rseq_get_rseq_cs(struct task_struct *t,
+		void __user **start_ip,
+		unsigned long *post_commit_offset,
+		void __user **abort_ip,
+		uint32_t *cs_flags)
+{
+	unsigned long ptr;
+	struct rseq_cs __user *urseq_cs;
+	struct rseq_cs rseq_cs;
+	u32 __user *usig;
+	u32 sig;
+
+	if (__get_user(ptr, &t->rseq->rseq_cs))
+		return false;
+	if (!ptr)
+		return true;
+	urseq_cs = (struct rseq_cs __user *)ptr;
+	if (copy_from_user(&rseq_cs, urseq_cs, sizeof(rseq_cs)))
+		return false;
+	/*
+	 * We need to clear rseq_cs upon entry into a signal handler
+	 * nested on top of a rseq assembly block, so the signal handler
+	 * will not be fixed up if itself interrupted by a nested signal
+	 * handler or preempted.  We also need to clear rseq_cs if we
+	 * preempt or deliver a signal on top of code outside of the
+	 * rseq assembly block, to ensure that a following preemption or
+	 * signal delivery will not try to perform a fixup needlessly.
+	 */
+	if (clear_user(&t->rseq->rseq_cs, sizeof(t->rseq->rseq_cs)))
+		return false;
+	if (rseq_cs.version > 0)
+		return false;
+	*cs_flags = rseq_cs.flags;
+	*start_ip = (void __user *)rseq_cs.start_ip;
+	*post_commit_offset = (unsigned long)rseq_cs.post_commit_offset;
+	*abort_ip = (void __user *)rseq_cs.abort_ip;
+	usig = (u32 __user *)(rseq_cs.abort_ip - sizeof(u32));
+	if (get_user(sig, usig))
+		return false;
+	if (current->rseq_sig != sig) {
+		printk_ratelimited(KERN_WARNING
+			"Possible attack attempt. Unexpected rseq signature 0x%x, expecting 0x%x (pid=%d, addr=%p).\n",
+			sig, current->rseq_sig, current->pid, usig);
+		return false;
+	}
+	return true;
+}
+
+static int rseq_need_restart(struct task_struct *t, uint32_t cs_flags)
+{
+	bool need_restart = false;
+	uint32_t flags;
+
+	/* Get thread flags. */
+	if (__get_user(flags, &t->rseq->flags))
+		return -EFAULT;
+
+	/* Take into account critical section flags. */
+	flags |= cs_flags;
+
+	/*
+	 * Restart on signal can only be inhibited when restart on
+	 * preempt and restart on migrate are inhibited too. Otherwise,
+	 * a preempted signal handler could fail to restart the prior
+	 * execution context on sigreturn.
+	 */
+	if (flags & RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL) {
+		if (!(flags & RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE))
+			return -EINVAL;
+		if (!(flags & RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT))
+			return -EINVAL;
+	}
+	if (t->rseq_migrate
+			&& !(flags & RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE))
+		need_restart = true;
+	else if (t->rseq_preempt
+			&& !(flags & RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT))
+		need_restart = true;
+	else if (t->rseq_signal
+			&& !(flags & RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL))
+		need_restart = true;
+
+	t->rseq_preempt = false;
+	t->rseq_signal = false;
+	t->rseq_migrate = false;
+	if (need_restart)
+		return 1;
+	return 0;
+}
+
+static int rseq_ip_fixup(struct pt_regs *regs)
+{
+	struct task_struct *t = current;
+	void __user *start_ip = NULL;
+	unsigned long post_commit_offset = 0;
+	void __user *abort_ip = NULL;
+	uint32_t cs_flags = 0;
+	int ret;
+
+	ret = rseq_get_rseq_cs(t, &start_ip, &post_commit_offset, &abort_ip,
+			&cs_flags);
+	trace_rseq_ip_fixup((void __user *)instruction_pointer(regs),
+		start_ip, post_commit_offset, abort_ip, ret);
+	if (!ret)
+		return -EFAULT;
+
+	ret = rseq_need_restart(t, cs_flags);
+	if (ret < 0)
+		return -EFAULT;
+	if (!ret)
+		return 0;
+	/*
+	 * Handle potentially not being within a critical section.
+	 * Unsigned comparison will be true when
+	 * ip < start_ip (wrap-around to large values), and when
+	 * ip >= start_ip + post_commit_offset.
+	 */
+	if ((unsigned long)instruction_pointer(regs) - (unsigned long)start_ip
+			>= post_commit_offset)
+		return 1;
+
+	instruction_pointer_set(regs, (unsigned long)abort_ip);
+	return 1;
+}
+
+/*
+ * This resume handler should always be executed between any of:
+ * - preemption,
+ * - signal delivery,
+ * and return to user-space.
+ *
+ * This is how we can ensure that the entire rseq critical section,
+ * consisting of both the C part and the assembly instruction sequence,
+ * will issue the commit instruction only if executed atomically with
+ * respect to other threads scheduled on the same CPU, and with respect
+ * to signal handlers.
+ */
+void __rseq_handle_notify_resume(struct pt_regs *regs)
+{
+	struct task_struct *t = current;
+	int ret;
+
+	if (unlikely(t->flags & PF_EXITING))
+		return;
+	if (unlikely(!access_ok(VERIFY_WRITE, t->rseq, sizeof(*t->rseq))))
+		goto error;
+	ret = rseq_ip_fixup(regs);
+	if (unlikely(ret < 0))
+		goto error;
+	if (unlikely(!rseq_update_cpu_id(t)))
+		goto error;
+	return;
+
+error:
+	force_sig(SIGSEGV, t);
+}
+
+/*
+ * sys_rseq - setup restartable sequences for caller thread.
+ */
+SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, uint32_t, rseq_len,
+		int, flags, uint32_t, sig)
+{
+	if (flags & RSEQ_FLAG_UNREGISTER) {
+		/* Unregister rseq for current thread. */
+		if (current->rseq != rseq || !current->rseq)
+			return -EINVAL;
+		if (current->rseq_len != rseq_len)
+			return -EINVAL;
+		if (current->rseq_sig != sig)
+			return -EPERM;
+		if (!rseq_reset_rseq_cpu_id(current))
+			return -EFAULT;
+		current->rseq = NULL;
+		current->rseq_len = 0;
+		current->rseq_sig = 0;
+		return 0;
+	}
+
+	if (unlikely(flags))
+		return -EINVAL;
+
+	if (current->rseq) {
+		/*
+		 * If rseq is already registered, check whether
+		 * the provided address differs from the prior
+		 * one.
+		 */
+		if (current->rseq != rseq
+				|| current->rseq_len != rseq_len)
+			return -EINVAL;
+		if (current->rseq_sig != sig)
+			return -EPERM;
+		return -EBUSY;	/* Already registered. */
+	} else {
+		/*
+		 * If there was no rseq previously registered,
+		 * we need to ensure the provided rseq is
+		 * properly aligned and valid.
+		 */
+		if (!IS_ALIGNED((unsigned long)rseq, __alignof__(*rseq))
+				|| rseq_len != sizeof(*rseq))
+			return -EINVAL;
+		if (!access_ok(VERIFY_WRITE, rseq, rseq_len))
+			return -EFAULT;
+		current->rseq = rseq;
+		current->rseq_len = rseq_len;
+		current->rseq_sig = sig;
+		/*
+		 * If rseq was previously inactive, and has just
+		 * been registered, ensure the cpu_id and
+		 * event_counter fields are updated before
+		 * returning to user-space.
+		 */
+		rseq_set_notify_resume(current);
+	}
+
+	return 0;
+}
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d17c5da523a0..6bba05f47e51 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1179,6 +1179,8 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
 	WARN_ON_ONCE(!cpu_online(new_cpu));
 #endif
 
+	rseq_migrate(p);
+
 	trace_sched_migrate_task(p, new_cpu);
 
 	if (task_cpu(p) != new_cpu) {
@@ -2581,6 +2583,7 @@ prepare_task_switch(struct rq *rq, struct task_struct *prev,
 {
 	sched_info_switch(rq, prev, next);
 	perf_event_task_sched_out(prev, next);
+	rseq_sched_out(prev);
 	fire_sched_out_preempt_notifiers(prev, next);
 	prepare_lock_switch(rq, next);
 	prepare_arch_switch(next);
@@ -3341,6 +3344,7 @@ static void __sched notrace __schedule(bool preempt)
 	clear_preempt_need_resched();
 
 	if (likely(prev != next)) {
+		rseq_preempt(prev);
 		rq->nr_switches++;
 		rq->curr = next;
 		/*
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index b5189762d275..bfa1ee1bf669 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -259,3 +259,6 @@ cond_syscall(sys_membarrier);
 cond_syscall(sys_pkey_mprotect);
 cond_syscall(sys_pkey_alloc);
 cond_syscall(sys_pkey_free);
+
+/* restartable sequence */
+cond_syscall(sys_rseq);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH for 4.15 02/14] Restartable sequences: ARM 32 architecture support
  2017-11-06 20:56 ` Mathieu Desnoyers
  (?)
  (?)
@ 2017-11-06 20:56 ` Mathieu Desnoyers
  -1 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2017-11-06 20:56 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers

Call the rseq_handle_notify_resume() function on return to
userspace if TIF_NOTIFY_RESUME thread flag is set.

Increment the event counter and perform fixup on the pre-signal frame
when a signal is delivered on top of a restartable sequence critical
section.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Paul Turner <pjt@google.com>
CC: Andrew Hunter <ahh@google.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: linux-api@vger.kernel.org
---
 arch/arm/Kconfig         | 1 +
 arch/arm/kernel/signal.c | 7 +++++++
 2 files changed, 8 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index d1346a160760..1469f3f39475 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -87,6 +87,7 @@ config ARM
 	select HAVE_PERF_USER_STACK_DUMP
 	select HAVE_RCU_TABLE_FREE if (SMP && ARM_LPAE)
 	select HAVE_REGS_AND_STACK_ACCESS_API
+	select HAVE_RSEQ
 	select HAVE_SYSCALL_TRACEPOINTS
 	select HAVE_UID16
 	select HAVE_VIRT_CPU_ACCOUNTING_GEN
diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c
index b67ae12503f3..cc3260f475b0 100644
--- a/arch/arm/kernel/signal.c
+++ b/arch/arm/kernel/signal.c
@@ -518,6 +518,12 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)
 	int ret;
 
 	/*
+	 * Increment event counter and perform fixup for the pre-signal
+	 * frame.
+	 */
+	rseq_signal_deliver(regs);
+
+	/*
 	 * Set up the stack frame
 	 */
 	if (ksig->ka.sa.sa_flags & SA_SIGINFO)
@@ -637,6 +643,7 @@ do_work_pending(struct pt_regs *regs, unsigned int thread_flags, int syscall)
 			} else {
 				clear_thread_flag(TIF_NOTIFY_RESUME);
 				tracehook_notify_resume(regs);
+				rseq_handle_notify_resume(regs);
 			}
 		}
 		local_irq_disable();
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH for 4.15 03/14] Restartable sequences: wire up ARM 32 system call
  2017-11-06 20:56 ` Mathieu Desnoyers
                   ` (2 preceding siblings ...)
  (?)
@ 2017-11-06 20:56 ` Mathieu Desnoyers
  -1 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2017-11-06 20:56 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers

Wire up the rseq system call on 32-bit ARM.

This provides an ABI improving the speed of a user-space getcpu
operation on ARM by skipping the getcpu system call on the fast path, as
well as improving the speed of user-space operations on per-cpu data
compared to using load-linked/store-conditional.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Paul Turner <pjt@google.com>
CC: Andrew Hunter <ahh@google.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: linux-api@vger.kernel.org
---
 arch/arm/tools/syscall.tbl | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index 0bb0e9c6376c..fbc74b5fa3ed 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -412,3 +412,4 @@
 395	common	pkey_alloc		sys_pkey_alloc
 396	common	pkey_free		sys_pkey_free
 397	common	statx			sys_statx
+398	common	rseq			sys_rseq
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH for 4.15 04/14] Restartable sequences: x86 32/64 architecture support
  2017-11-06 20:56 ` Mathieu Desnoyers
                   ` (3 preceding siblings ...)
  (?)
@ 2017-11-06 20:56 ` Mathieu Desnoyers
  -1 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2017-11-06 20:56 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers

Call the rseq_handle_notify_resume() function on return to userspace if
TIF_NOTIFY_RESUME thread flag is set.

Increment the event counter and perform fixup on the pre-signal frame
when a signal is delivered on top of a restartable sequence critical
section.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Paul Turner <pjt@google.com>
CC: Andrew Hunter <ahh@google.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: linux-api@vger.kernel.org
---
 arch/x86/Kconfig         | 1 +
 arch/x86/entry/common.c  | 1 +
 arch/x86/kernel/signal.c | 6 ++++++
 3 files changed, 8 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2fdb23313dd5..01f78c1d40b5 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -173,6 +173,7 @@ config X86
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RELIABLE_STACKTRACE		if X86_64 && FRAME_POINTER_UNWINDER && STACK_VALIDATION
 	select HAVE_STACK_VALIDATION		if X86_64
+	select HAVE_RSEQ
 	select HAVE_SYSCALL_TRACEPOINTS
 	select HAVE_UNSTABLE_SCHED_CLOCK
 	select HAVE_USER_RETURN_NOTIFIER
diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 03505ffbe1b6..4c717bdd1139 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -160,6 +160,7 @@ static void exit_to_usermode_loop(struct pt_regs *regs, u32 cached_flags)
 		if (cached_flags & _TIF_NOTIFY_RESUME) {
 			clear_thread_flag(TIF_NOTIFY_RESUME);
 			tracehook_notify_resume(regs);
+			rseq_handle_notify_resume(regs);
 		}
 
 		if (cached_flags & _TIF_USER_RETURN_NOTIFY)
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index b9e00e8f1c9b..991017d26d00 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -687,6 +687,12 @@ setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs)
 	sigset_t *set = sigmask_to_save();
 	compat_sigset_t *cset = (compat_sigset_t *) set;
 
+	/*
+	 * Increment event counter and perform fixup for the pre-signal
+	 * frame.
+	 */
+	rseq_signal_deliver(regs);
+
 	/* Set up the stack frame */
 	if (is_ia32_frame(ksig)) {
 		if (ksig->ka.sa.sa_flags & SA_SIGINFO)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH for 4.15 05/14] Restartable sequences: wire up x86 32/64 system call
  2017-11-06 20:56 ` Mathieu Desnoyers
                   ` (4 preceding siblings ...)
  (?)
@ 2017-11-06 20:56 ` Mathieu Desnoyers
  -1 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2017-11-06 20:56 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers

Wire up the rseq system call on x86 32/64.

This provides an ABI improving the speed of a user-space getcpu
operation on x86 by removing the need to perform a function call, "lsl"
instruction, or system call on the fast path, as well as improving the
speed of user-space operations on per-cpu data.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Paul Turner <pjt@google.com>
CC: Andrew Hunter <ahh@google.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: linux-api@vger.kernel.org
---
 arch/x86/entry/syscalls/syscall_32.tbl | 1 +
 arch/x86/entry/syscalls/syscall_64.tbl | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 448ac2161112..ba43ee75e425 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -391,3 +391,4 @@
 382	i386	pkey_free		sys_pkey_free
 383	i386	statx			sys_statx
 384	i386	arch_prctl		sys_arch_prctl			compat_sys_arch_prctl
+385	i386	rseq			sys_rseq
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 5aef183e2f85..3ad03495bbb9 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -339,6 +339,7 @@
 330	common	pkey_alloc		sys_pkey_alloc
 331	common	pkey_free		sys_pkey_free
 332	common	statx			sys_statx
+333	common	rseq			sys_rseq
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH for 4.15 06/14] Restartable sequences: powerpc architecture support
@ 2017-11-06 20:56   ` Mathieu Desnoyers
  0 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2017-11-06 20:56 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linuxppc-dev

From: Boqun Feng <boqun.feng@gmail.com>

Call the rseq_handle_notify_resume() function on return to userspace if
TIF_NOTIFY_RESUME thread flag is set.

Increment the event counter and perform fixup on the pre-signal when a
signal is delivered on top of a restartable sequence critical section.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Peter Zijlstra <peterz@infradead.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/Kconfig         | 1 +
 arch/powerpc/kernel/signal.c | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index cb782ac1c35d..41d1dae3b1b5 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -219,6 +219,7 @@ config PPC
 	select HAVE_SYSCALL_TRACEPOINTS
 	select HAVE_VIRT_CPU_ACCOUNTING
 	select HAVE_IRQ_TIME_ACCOUNTING
+	select HAVE_RSEQ
 	select IRQ_DOMAIN
 	select IRQ_FORCED_THREADING
 	select MODULES_USE_ELF_RELA
diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index e9436c5e1e09..17a994b801b1 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -133,6 +133,8 @@ static void do_signal(struct task_struct *tsk)
 	/* Re-enable the breakpoints for the signal stack */
 	thread_change_pc(tsk, tsk->thread.regs);
 
+	rseq_signal_deliver(tsk->thread.regs);
+
 	if (is32) {
         	if (ksig.ka.sa.sa_flags & SA_SIGINFO)
 			ret = handle_rt_signal32(&ksig, oldset, tsk);
@@ -161,6 +163,7 @@ void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags)
 	if (thread_info_flags & _TIF_NOTIFY_RESUME) {
 		clear_thread_flag(TIF_NOTIFY_RESUME);
 		tracehook_notify_resume(regs);
+		rseq_handle_notify_resume(regs);
 	}
 
 	if (thread_info_flags & _TIF_PATCH_PENDING)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH for 4.15 06/14] Restartable sequences: powerpc architecture support
@ 2017-11-06 20:56   ` Mathieu Desnoyers
  0 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2017-11-06 20:56 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers,
	Benjamin Herrenschmidt

From: Boqun Feng <boqun.feng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Call the rseq_handle_notify_resume() function on return to userspace if
TIF_NOTIFY_RESUME thread flag is set.

Increment the event counter and perform fixup on the pre-signal when a
signal is delivered on top of a restartable sequence critical section.

Signed-off-by: Boqun Feng <boqun.feng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
CC: Benjamin Herrenschmidt <benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r@public.gmane.org>
CC: Paul Mackerras <paulus-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org>
CC: Michael Ellerman <mpe-Gsx/Oe8HsFggBc27wqDAHg@public.gmane.org>
CC: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
CC: "Paul E. McKenney" <paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
CC: linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org
---
 arch/powerpc/Kconfig         | 1 +
 arch/powerpc/kernel/signal.c | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index cb782ac1c35d..41d1dae3b1b5 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -219,6 +219,7 @@ config PPC
 	select HAVE_SYSCALL_TRACEPOINTS
 	select HAVE_VIRT_CPU_ACCOUNTING
 	select HAVE_IRQ_TIME_ACCOUNTING
+	select HAVE_RSEQ
 	select IRQ_DOMAIN
 	select IRQ_FORCED_THREADING
 	select MODULES_USE_ELF_RELA
diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index e9436c5e1e09..17a994b801b1 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -133,6 +133,8 @@ static void do_signal(struct task_struct *tsk)
 	/* Re-enable the breakpoints for the signal stack */
 	thread_change_pc(tsk, tsk->thread.regs);
 
+	rseq_signal_deliver(tsk->thread.regs);
+
 	if (is32) {
         	if (ksig.ka.sa.sa_flags & SA_SIGINFO)
 			ret = handle_rt_signal32(&ksig, oldset, tsk);
@@ -161,6 +163,7 @@ void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags)
 	if (thread_info_flags & _TIF_NOTIFY_RESUME) {
 		clear_thread_flag(TIF_NOTIFY_RESUME);
 		tracehook_notify_resume(regs);
+		rseq_handle_notify_resume(regs);
 	}
 
 	if (thread_info_flags & _TIF_PATCH_PENDING)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH for 4.15 07/14] Restartable sequences: Wire up powerpc system call
  2017-11-06 20:56 ` Mathieu Desnoyers
@ 2017-11-06 20:56   ` Mathieu Desnoyers
  -1 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2017-11-06 20:56 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linuxppc-dev

From: Boqun Feng <boqun.feng@gmail.com>

Wire up the rseq system call on powerpc.

This provides an ABI improving the speed of a user-space getcpu
operation on powerpc by skipping the getcpu system call on the fast
path, as well as improving the speed of user-space operations on per-cpu
data compared to using load-reservation/store-conditional atomics.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Peter Zijlstra <peterz@infradead.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/include/asm/systbl.h      | 1 +
 arch/powerpc/include/asm/unistd.h      | 2 +-
 arch/powerpc/include/uapi/asm/unistd.h | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/systbl.h b/arch/powerpc/include/asm/systbl.h
index 449912f057f6..964321a5799c 100644
--- a/arch/powerpc/include/asm/systbl.h
+++ b/arch/powerpc/include/asm/systbl.h
@@ -389,3 +389,4 @@ COMPAT_SYS_SPU(preadv2)
 COMPAT_SYS_SPU(pwritev2)
 SYSCALL(kexec_file_load)
 SYSCALL(statx)
+SYSCALL(rseq)
diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h
index 9ba11dbcaca9..e76bd5601ea4 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -12,7 +12,7 @@
 #include <uapi/asm/unistd.h>
 
 
-#define NR_syscalls		384
+#define NR_syscalls		385
 
 #define __NR__exit __NR_exit
 
diff --git a/arch/powerpc/include/uapi/asm/unistd.h b/arch/powerpc/include/uapi/asm/unistd.h
index df8684f31919..b1980fcd56d5 100644
--- a/arch/powerpc/include/uapi/asm/unistd.h
+++ b/arch/powerpc/include/uapi/asm/unistd.h
@@ -395,5 +395,6 @@
 #define __NR_pwritev2		381
 #define __NR_kexec_file_load	382
 #define __NR_statx		383
+#define __NR_rseq		384
 
 #endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH for 4.15 07/14] Restartable sequences: Wire up powerpc system call
@ 2017-11-06 20:56   ` Mathieu Desnoyers
  0 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2017-11-06 20:56 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers,
	Benjamin Herrenschmidt

From: Boqun Feng <boqun.feng@gmail.com>

Wire up the rseq system call on powerpc.

This provides an ABI improving the speed of a user-space getcpu
operation on powerpc by skipping the getcpu system call on the fast
path, as well as improving the speed of user-space operations on per-cpu
data compared to using load-reservation/store-conditional atomics.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Peter Zijlstra <peterz@infradead.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/include/asm/systbl.h      | 1 +
 arch/powerpc/include/asm/unistd.h      | 2 +-
 arch/powerpc/include/uapi/asm/unistd.h | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/systbl.h b/arch/powerpc/include/asm/systbl.h
index 449912f057f6..964321a5799c 100644
--- a/arch/powerpc/include/asm/systbl.h
+++ b/arch/powerpc/include/asm/systbl.h
@@ -389,3 +389,4 @@ COMPAT_SYS_SPU(preadv2)
 COMPAT_SYS_SPU(pwritev2)
 SYSCALL(kexec_file_load)
 SYSCALL(statx)
+SYSCALL(rseq)
diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h
index 9ba11dbcaca9..e76bd5601ea4 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -12,7 +12,7 @@
 #include <uapi/asm/unistd.h>
 
 
-#define NR_syscalls		384
+#define NR_syscalls		385
 
 #define __NR__exit __NR_exit
 
diff --git a/arch/powerpc/include/uapi/asm/unistd.h b/arch/powerpc/include/uapi/asm/unistd.h
index df8684f31919..b1980fcd56d5 100644
--- a/arch/powerpc/include/uapi/asm/unistd.h
+++ b/arch/powerpc/include/uapi/asm/unistd.h
@@ -395,5 +395,6 @@
 #define __NR_pwritev2		381
 #define __NR_kexec_file_load	382
 #define __NR_statx		383
+#define __NR_rseq		384
 
 #endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH v2 for 4.15 08/14] Provide cpu_opv system call
  2017-11-06 20:56 ` Mathieu Desnoyers
                   ` (7 preceding siblings ...)
  (?)
@ 2017-11-06 20:56 ` Mathieu Desnoyers
  2017-11-07  2:07     ` Boqun Feng
  -1 siblings, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2017-11-06 20:56 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers

This new cpu_opv system call executes a vector of operations on behalf
of user-space on a specific CPU with preemption disabled. It is inspired
from readv() and writev() system calls which take a "struct iovec" array
as argument.

The operations available are: comparison, memcpy, add, or, and, xor,
left shift, right shift, and mb. The system call receives a CPU number
from user-space as argument, which is the CPU on which those operations
need to be performed. All preparation steps such as loading pointers,
and applying offsets to arrays, need to be performed by user-space
before invoking the system call. The "comparison" operation can be used
to check that the data used in the preparation step did not change
between preparation of system call inputs and operation execution within
the preempt-off critical section.

The reason why we require all pointer offsets to be calculated by
user-space beforehand is because we need to use get_user_pages_fast() to
first pin all pages touched by each operation. This takes care of
faulting-in the pages. Then, preemption is disabled, and the operations
are performed atomically with respect to other thread execution on that
CPU, without generating any page fault.

A maximum limit of 16 operations per cpu_opv syscall invocation is
enforced, so user-space cannot generate a too long preempt-off critical
section. Each operation is also limited a length of PAGE_SIZE bytes,
meaning that an operation can touch a maximum of 4 pages (memcpy: 2
pages for source, 2 pages for destination if addresses are not aligned
on page boundaries). Moreover, a total limit of 4216 bytes is applied
to operation lengths.

If the thread is not running on the requested CPU, a new
push_task_to_cpu() is invoked to migrate the task to the requested CPU.
If the requested CPU is not part of the cpus allowed mask of the thread,
the system call fails with EINVAL. After the migration has been
performed, preemption is disabled, and the current CPU number is checked
again and compared to the requested CPU number. If it still differs, it
means the scheduler migrated us away from that CPU. Return EAGAIN to
user-space in that case, and let user-space retry (either requesting the
same CPU number, or a different one, depending on the user-space
algorithm constraints).

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul Turner <pjt@google.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Andrew Hunter <ahh@google.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Michael Kerrisk <mtk.manpages@gmail.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: linux-api@vger.kernel.org
---

Changes since v1:
- handle CPU hotplug,
- cleanup implementation using function pointers: We can use function
  pointers to implement the operations rather than duplicating all the
  user-access code.
- refuse device pages: Performing cpu_opv operations on io map'd pages
  with preemption disabled could generate long preempt-off critical
  sections, which leads to unwanted scheduler latency. Return EFAULT if
  a device page is received as parameter
- restrict op vector to 4216 bytes length sum: Restrict the operation
  vector to length sum of:
  - 4096 bytes (typical page size on most architectures, should be
    enough for a string, or structures)
  - 15 * 8 bytes (typical operations on integers or pointers).
  The goal here is to keep the duration of preempt off critical section
  short, so we don't add significant scheduler latency.
- Add INIT_ONSTACK macro: Introduce the
  CPU_OP_FIELD_u32_u64_INIT_ONSTACK() macros to ensure that users
  correctly initialize the upper bits of CPU_OP_FIELD_u32_u64() on their
  stack to 0 on 32-bit architectures.
- Add CPU_MB_OP operation:
  Use-cases with:
  - two consecutive stores,
  - a mempcy followed by a store,
  require a memory barrier before the final store operation. A typical
  use-case is a store-release on the final store. Given that this is a
  slow path, just providing an explicit full barrier instruction should
  be sufficient.
- Add expect fault field:
  The use-case of list_pop brings interesting challenges. With rseq, we
  can use rseq_cmpnev_storeoffp_load(), and therefore load a pointer,
  compare it against NULL, add an offset, and load the target "next"
  pointer from the object, all within a single req critical section.

  Life is not so easy for cpu_opv in this use-case, mainly because we
  need to pin all pages we are going to touch in the preempt-off
  critical section beforehand. So we need to know the target object (in
  which we apply an offset to fetch the next pointer) when we pin pages
  before disabling preemption.

  So the approach is to load the head pointer and compare it against
  NULL in user-space, before doing the cpu_opv syscall. User-space can
  then compute the address of the head->next field, *without loading it*.

  The cpu_opv system call will first need to pin all pages associated
  with input data. This includes the page backing the head->next object,
  which may have been concurrently deallocated and unmapped. Therefore,
  in this case, getting -EFAULT when trying to pin those pages may
  happen: it just means they have been concurrently unmapped. This is
  an expected situation, and should just return -EAGAIN to user-space,
  to user-space can distinguish between "should retry" type of
  situations and actual errors that should be handled with extreme
  prejudice to the program (e.g. abort()).

  Therefore, add "expect_fault" fields along with op input address
  pointers, so user-space can identify whether a fault when getting a
  field should return EAGAIN rather than EFAULT.
- Add compiler barrier between operations: Adding a compiler barrier
  between store operations in a cpu_opv sequence can be useful when
  paired with membarrier system call.

  An algorithm with a paired slow path and fast path can use
  sys_membarrier on the slow path to replace fast-path memory barriers
  by compiler barrier.

  Adding an explicit compiler barrier between operations allows
  cpu_opv to be used as fallback for operations meant to match
  the membarrier system call.
---
 MAINTAINERS                  |   7 +
 include/uapi/linux/cpu_opv.h | 117 ++++++
 init/Kconfig                 |  14 +
 kernel/Makefile              |   1 +
 kernel/cpu_opv.c             | 952 +++++++++++++++++++++++++++++++++++++++++++
 kernel/sched/core.c          |  37 ++
 kernel/sched/sched.h         |   2 +
 kernel/sys_ni.c              |   1 +
 8 files changed, 1131 insertions(+)
 create mode 100644 include/uapi/linux/cpu_opv.h
 create mode 100644 kernel/cpu_opv.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 353366928ae8..6a428d6cf494 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3675,6 +3675,13 @@ B:	https://bugzilla.kernel.org
 F:	drivers/cpuidle/*
 F:	include/linux/cpuidle.h
 
+CPU NON-PREEMPTIBLE OPERATION VECTOR SUPPORT
+M:	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+L:	linux-kernel@vger.kernel.org
+S:	Supported
+F:	kernel/cpu_opv.c
+F:	include/uapi/linux/cpu_opv.h
+
 CRAMFS FILESYSTEM
 W:	http://sourceforge.net/projects/cramfs/
 S:	Orphan / Obsolete
diff --git a/include/uapi/linux/cpu_opv.h b/include/uapi/linux/cpu_opv.h
new file mode 100644
index 000000000000..17f7d46e053b
--- /dev/null
+++ b/include/uapi/linux/cpu_opv.h
@@ -0,0 +1,117 @@
+#ifndef _UAPI_LINUX_CPU_OPV_H
+#define _UAPI_LINUX_CPU_OPV_H
+
+/*
+ * linux/cpu_opv.h
+ *
+ * CPU preempt-off operation vector system call API
+ *
+ * Copyright (c) 2017 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifdef __KERNEL__
+# include <linux/types.h>
+#else	/* #ifdef __KERNEL__ */
+# include <stdint.h>
+#endif	/* #else #ifdef __KERNEL__ */
+
+#include <asm/byteorder.h>
+
+#ifdef __LP64__
+# define CPU_OP_FIELD_u32_u64(field)			uint64_t field
+# define CPU_OP_FIELD_u32_u64_INIT_ONSTACK(field, v)	field = (intptr_t)v
+#elif defined(__BYTE_ORDER) ? \
+	__BYTE_ORDER == __BIG_ENDIAN : defined(__BIG_ENDIAN)
+# define CPU_OP_FIELD_u32_u64(field)	uint32_t field ## _padding, field
+# define CPU_OP_FIELD_u32_u64_INIT_ONSTACK(field, v)	\
+	field ## _padding = 0, field = (intptr_t)v
+#else
+# define CPU_OP_FIELD_u32_u64(field)	uint32_t field, field ## _padding
+# define CPU_OP_FIELD_u32_u64_INIT_ONSTACK(field, v)	\
+	field = (intptr_t)v, field ## _padding = 0
+#endif
+
+#define CPU_OP_VEC_LEN_MAX		16
+#define CPU_OP_ARG_LEN_MAX		24
+/* Max. data len per operation. */
+#define CPU_OP_DATA_LEN_MAX		PAGE_SIZE
+/*
+ * Max. data len for overall vector. We to restrict the amount of
+ * user-space data touched by the kernel in non-preemptible context so
+ * we do not introduce long scheduler latencies.
+ * This allows one copy of up to 4096 bytes, and 15 operations touching
+ * 8 bytes each.
+ * This limit is applied to the sum of length specified for all
+ * operations in a vector.
+ */
+#define CPU_OP_VEC_DATA_LEN_MAX		(4096 + 15*8)
+#define CPU_OP_MAX_PAGES		4	/* Max. pages per op. */
+
+enum cpu_op_type {
+	CPU_COMPARE_EQ_OP,	/* compare */
+	CPU_COMPARE_NE_OP,	/* compare */
+	CPU_MEMCPY_OP,		/* memcpy */
+	CPU_ADD_OP,		/* arithmetic */
+	CPU_OR_OP,		/* bitwise */
+	CPU_AND_OP,		/* bitwise */
+	CPU_XOR_OP,		/* bitwise */
+	CPU_LSHIFT_OP,		/* shift */
+	CPU_RSHIFT_OP,		/* shift */
+	CPU_MB_OP,		/* memory barrier */
+};
+
+/* Vector of operations to perform. Limited to 16. */
+struct cpu_op {
+	int32_t op;	/* enum cpu_op_type. */
+	uint32_t len;	/* data length, in bytes. */
+	union {
+		struct {
+			CPU_OP_FIELD_u32_u64(a);
+			CPU_OP_FIELD_u32_u64(b);
+			uint8_t expect_fault_a;
+			uint8_t expect_fault_b;
+		} compare_op;
+		struct {
+			CPU_OP_FIELD_u32_u64(dst);
+			CPU_OP_FIELD_u32_u64(src);
+			uint8_t expect_fault_dst;
+			uint8_t expect_fault_src;
+		} memcpy_op;
+		struct {
+			CPU_OP_FIELD_u32_u64(p);
+			int64_t count;
+			uint8_t expect_fault_p;
+		} arithmetic_op;
+		struct {
+			CPU_OP_FIELD_u32_u64(p);
+			uint64_t mask;
+			uint8_t expect_fault_p;
+		} bitwise_op;
+		struct {
+			CPU_OP_FIELD_u32_u64(p);
+			uint32_t bits;
+			uint8_t expect_fault_p;
+		} shift_op;
+		char __padding[CPU_OP_ARG_LEN_MAX];
+	} u;
+};
+
+#endif /* _UAPI_LINUX_CPU_OPV_H */
diff --git a/init/Kconfig b/init/Kconfig
index cbedfb91b40a..e4fbb5dd6a24 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1404,6 +1404,7 @@ config RSEQ
 	bool "Enable rseq() system call" if EXPERT
 	default y
 	depends on HAVE_RSEQ
+	select CPU_OPV
 	select MEMBARRIER
 	help
 	  Enable the restartable sequences system call. It provides a
@@ -1414,6 +1415,19 @@ config RSEQ
 
 	  If unsure, say Y.
 
+config CPU_OPV
+	bool "Enable cpu_opv() system call" if EXPERT
+	default y
+	help
+	  Enable the CPU preempt-off operation vector system call.
+	  It allows user-space to perform a sequence of operations on
+	  per-cpu data with preemption disabled. Useful as
+	  single-stepping fall-back for restartable sequences, and for
+	  performing more complex operations on per-cpu data that would
+	  not be otherwise possible to do with restartable sequences.
+
+	  If unsure, say Y.
+
 config EMBEDDED
 	bool "Embedded system"
 	option allnoconfig_y
diff --git a/kernel/Makefile b/kernel/Makefile
index 3574669dafd9..cac8855196ff 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -113,6 +113,7 @@ obj-$(CONFIG_TORTURE_TEST) += torture.o
 
 obj-$(CONFIG_HAS_IOMEM) += memremap.o
 obj-$(CONFIG_RSEQ) += rseq.o
+obj-$(CONFIG_CPU_OPV) += cpu_opv.o
 
 $(obj)/configs.o: $(obj)/config_data.h
 
diff --git a/kernel/cpu_opv.c b/kernel/cpu_opv.c
new file mode 100644
index 000000000000..09754bbe6a4f
--- /dev/null
+++ b/kernel/cpu_opv.c
@@ -0,0 +1,952 @@
+/*
+ * CPU preempt-off operation vector system call
+ *
+ * It allows user-space to perform a sequence of operations on per-cpu
+ * data with preemption disabled. Useful as single-stepping fall-back
+ * for restartable sequences, and for performing more complex operations
+ * on per-cpu data that would not be otherwise possible to do with
+ * restartable sequences.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Copyright (C) 2017, EfficiOS Inc.,
+ * Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ */
+
+#include <linux/sched.h>
+#include <linux/uaccess.h>
+#include <linux/syscalls.h>
+#include <linux/cpu_opv.h>
+#include <linux/types.h>
+#include <linux/mutex.h>
+#include <linux/pagemap.h>
+#include <asm/ptrace.h>
+#include <asm/byteorder.h>
+
+#include "sched/sched.h"
+
+#define TMP_BUFLEN			64
+#define NR_PINNED_PAGES_ON_STACK	8
+
+union op_fn_data {
+	uint8_t _u8;
+	uint16_t _u16;
+	uint32_t _u32;
+	uint64_t _u64;
+#if (BITS_PER_LONG < 64)
+	uint32_t _u64_split[2];
+#endif
+};
+
+typedef int (*op_fn_t)(union op_fn_data *data, uint64_t v, uint32_t len);
+
+static DEFINE_MUTEX(cpu_opv_offline_lock);
+
+/*
+ * The cpu_opv system call executes a vector of operations on behalf of
+ * user-space on a specific CPU with preemption disabled. It is inspired
+ * from readv() and writev() system calls which take a "struct iovec"
+ * array as argument.
+ * 
+ * The operations available are: comparison, memcpy, add, or, and, xor,
+ * left shift, and right shift. The system call receives a CPU number
+ * from user-space as argument, which is the CPU on which those
+ * operations need to be performed. All preparation steps such as
+ * loading pointers, and applying offsets to arrays, need to be
+ * performed by user-space before invoking the system call. The
+ * "comparison" operation can be used to check that the data used in the
+ * preparation step did not change between preparation of system call
+ * inputs and operation execution within the preempt-off critical
+ * section.
+ * 
+ * The reason why we require all pointer offsets to be calculated by
+ * user-space beforehand is because we need to use get_user_pages_fast()
+ * to first pin all pages touched by each operation. This takes care of
+ * faulting-in the pages. Then, preemption is disabled, and the
+ * operations are performed atomically with respect to other thread
+ * execution on that CPU, without generating any page fault.
+ * 
+ * A maximum limit of 16 operations per cpu_opv syscall invocation is
+ * enforced, and a overall maximum length sum, so user-space cannot
+ * generate a too long preempt-off critical section. Each operation is
+ * also limited a length of PAGE_SIZE bytes, meaning that an operation
+ * can touch a maximum of 4 pages (memcpy: 2 pages for source, 2 pages
+ * for destination if addresses are not aligned on page boundaries).
+ * 
+ * If the thread is not running on the requested CPU, a new
+ * push_task_to_cpu() is invoked to migrate the task to the requested
+ * CPU.  If the requested CPU is not part of the cpus allowed mask of
+ * the thread, the system call fails with EINVAL. After the migration
+ * has been performed, preemption is disabled, and the current CPU
+ * number is checked again and compared to the requested CPU number. If
+ * it still differs, it means the scheduler migrated us away from that
+ * CPU. Return EAGAIN to user-space in that case, and let user-space
+ * retry (either requesting the same CPU number, or a different one,
+ * depending on the user-space algorithm constraints).
+ */
+
+/*
+ * Check operation types and length parameters.
+ */
+static int cpu_opv_check(struct cpu_op *cpuop, int cpuopcnt)
+{
+	int i;
+	uint32_t sum = 0;
+
+	for (i = 0; i < cpuopcnt; i++) {
+		struct cpu_op *op = &cpuop[i];
+
+		switch (op->op) {
+		case CPU_MB_OP:
+			break;
+		default:
+			sum += op->len;
+		}
+		switch (op->op) {
+		case CPU_COMPARE_EQ_OP:
+		case CPU_COMPARE_NE_OP:
+		case CPU_MEMCPY_OP:
+			if (op->len > CPU_OP_DATA_LEN_MAX)
+				return -EINVAL;
+			break;
+		case CPU_ADD_OP:
+		case CPU_OR_OP:
+		case CPU_AND_OP:
+		case CPU_XOR_OP:
+			switch (op->len) {
+			case 1:
+			case 2:
+			case 4:
+			case 8:
+				break;
+			default:
+				return -EINVAL;
+			}
+			break;
+		case CPU_LSHIFT_OP:
+		case CPU_RSHIFT_OP:
+			switch (op->len) {
+			case 1:
+				if (op->u.shift_op.bits > 7)
+					return -EINVAL;
+				break;
+			case 2:
+				if (op->u.shift_op.bits > 15)
+					return -EINVAL;
+				break;
+			case 4:
+				if (op->u.shift_op.bits > 31)
+					return -EINVAL;
+				break;
+			case 8:
+				if (op->u.shift_op.bits > 63)
+					return -EINVAL;
+				break;
+			default:
+				return -EINVAL;
+			}
+			break;
+		case CPU_MB_OP:
+			break;
+		default:
+			return -EINVAL;
+		}
+	}
+	if (sum > CPU_OP_VEC_DATA_LEN_MAX)
+		return -EINVAL;
+	return 0;
+}
+
+static unsigned long cpu_op_range_nr_pages(unsigned long addr,
+		unsigned long len)
+{
+	return ((addr + len - 1) >> PAGE_SHIFT) - (addr >> PAGE_SHIFT) + 1;
+}
+
+static int cpu_op_check_page(struct page *page)
+{
+	struct address_space *mapping;
+
+	if (is_zone_device_page(page))
+		return -EFAULT;
+	page = compound_head(page);
+	mapping = READ_ONCE(page->mapping);
+	if (!mapping) {
+		int shmem_swizzled;
+
+		/*
+		 * Check again with page lock held to guard against
+		 * memory pressure making shmem_writepage move the page
+		 * from filecache to swapcache.
+		 */
+		lock_page(page);
+		shmem_swizzled = PageSwapCache(page) || page->mapping;
+		unlock_page(page);
+		if (shmem_swizzled)
+			return -EAGAIN;
+		return -EFAULT;
+	}
+	return 0;
+}
+
+/*
+ * Refusing device pages, the zero page, pages in the gate area, and
+ * special mappings. Inspired from futex.c checks.
+ */
+static int cpu_op_check_pages(struct page **pages,
+		unsigned long nr_pages)
+{
+	unsigned long i;
+
+	for (i = 0; i < nr_pages; i++) {
+		int ret;
+
+		ret = cpu_op_check_page(pages[i]);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+static int cpu_op_pin_pages(unsigned long addr, unsigned long len,
+		struct page ***pinned_pages_ptr, size_t *nr_pinned,
+		int write)
+{
+	struct page *pages[2];
+	int ret, nr_pages;
+
+	if (!len)
+		return 0;
+	nr_pages = cpu_op_range_nr_pages(addr, len);
+	BUG_ON(nr_pages > 2);
+	if (*nr_pinned + nr_pages > NR_PINNED_PAGES_ON_STACK) {
+		struct page **pinned_pages =
+			kzalloc(CPU_OP_VEC_LEN_MAX * CPU_OP_MAX_PAGES
+				* sizeof(struct page *), GFP_KERNEL);
+		if (!pinned_pages)
+			return -ENOMEM;
+		memcpy(pinned_pages, *pinned_pages_ptr,
+			*nr_pinned * sizeof(struct page *));
+		*pinned_pages_ptr = pinned_pages;
+	}
+again:
+	ret = get_user_pages_fast(addr, nr_pages, write, pages);
+	if (ret < nr_pages) {
+		if (ret > 0)
+			put_page(pages[0]);
+		return -EFAULT;
+	}
+	/*
+	 * Refuse device pages, the zero page, pages in the gate area,
+	 * and special mappings.
+	 */
+	ret = cpu_op_check_pages(pages, nr_pages);
+	if (ret == -EAGAIN) {
+		put_page(pages[0]);
+		if (nr_pages > 1)
+			put_page(pages[1]);
+		goto again;
+	}
+	if (ret)
+		goto error;
+	(*pinned_pages_ptr)[(*nr_pinned)++] = pages[0];
+	if (nr_pages > 1)
+		(*pinned_pages_ptr)[(*nr_pinned)++] = pages[1];
+	return 0;
+
+error:
+	put_page(pages[0]);
+	if (nr_pages > 1)
+		put_page(pages[1]);
+	return -EFAULT;
+}
+
+static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
+		struct page ***pinned_pages_ptr, size_t *nr_pinned)
+{
+	int ret, i;
+	bool expect_fault = false;
+
+	/* Check access, pin pages. */
+	for (i = 0; i < cpuopcnt; i++) {
+		struct cpu_op *op = &cpuop[i];
+
+		switch (op->op) {
+		case CPU_COMPARE_EQ_OP:
+		case CPU_COMPARE_NE_OP:
+			ret = -EFAULT;
+			expect_fault = op->u.compare_op.expect_fault_a;
+			if (!access_ok(VERIFY_READ, op->u.compare_op.a,
+					op->len))
+				goto error;
+			ret = cpu_op_pin_pages(
+					(unsigned long)op->u.compare_op.a,
+					op->len, pinned_pages_ptr, nr_pinned, 0);
+			if (ret)
+				goto error;
+			ret = -EFAULT;
+			expect_fault = op->u.compare_op.expect_fault_b;
+			if (!access_ok(VERIFY_READ, op->u.compare_op.b,
+					op->len))
+				goto error;
+			ret = cpu_op_pin_pages(
+					(unsigned long)op->u.compare_op.b,
+					op->len, pinned_pages_ptr, nr_pinned, 0);
+			if (ret)
+				goto error;
+			break;
+		case CPU_MEMCPY_OP:
+			ret = -EFAULT;
+			expect_fault = op->u.memcpy_op.expect_fault_dst;
+			if (!access_ok(VERIFY_WRITE, op->u.memcpy_op.dst,
+					op->len))
+				goto error;
+			ret = cpu_op_pin_pages(
+					(unsigned long)op->u.memcpy_op.dst,
+					op->len, pinned_pages_ptr, nr_pinned, 1);
+			if (ret)
+				goto error;
+			ret = -EFAULT;
+			expect_fault = op->u.memcpy_op.expect_fault_src;
+			if (!access_ok(VERIFY_READ, op->u.memcpy_op.src,
+					op->len))
+				goto error;
+			ret = cpu_op_pin_pages(
+					(unsigned long)op->u.memcpy_op.src,
+					op->len, pinned_pages_ptr, nr_pinned, 0);
+			if (ret)
+				goto error;
+			break;
+		case CPU_ADD_OP:
+			ret = -EFAULT;
+			expect_fault = op->u.arithmetic_op.expect_fault_p;
+			if (!access_ok(VERIFY_WRITE, op->u.arithmetic_op.p,
+					op->len))
+				goto error;
+			ret = cpu_op_pin_pages(
+					(unsigned long)op->u.arithmetic_op.p,
+					op->len, pinned_pages_ptr, nr_pinned, 1);
+			if (ret)
+				goto error;
+			break;
+		case CPU_OR_OP:
+		case CPU_AND_OP:
+		case CPU_XOR_OP:
+			ret = -EFAULT;
+			expect_fault = op->u.bitwise_op.expect_fault_p;
+			if (!access_ok(VERIFY_WRITE, op->u.bitwise_op.p,
+					op->len))
+				goto error;
+			ret = cpu_op_pin_pages(
+					(unsigned long)op->u.bitwise_op.p,
+					op->len, pinned_pages_ptr, nr_pinned, 1);
+			if (ret)
+				goto error;
+			break;
+		case CPU_LSHIFT_OP:
+		case CPU_RSHIFT_OP:
+			ret = -EFAULT;
+			expect_fault = op->u.shift_op.expect_fault_p;
+			if (!access_ok(VERIFY_WRITE, op->u.shift_op.p,
+					op->len))
+				goto error;
+			ret = cpu_op_pin_pages(
+					(unsigned long)op->u.shift_op.p,
+					op->len, pinned_pages_ptr, nr_pinned, 1);
+			if (ret)
+				goto error;
+			break;
+		case CPU_MB_OP:
+			break;
+		default:
+			return -EINVAL;
+		}
+	}
+	return 0;
+
+error:
+	for (i = 0; i < *nr_pinned; i++)
+		put_page((*pinned_pages_ptr)[i]);
+	*nr_pinned = 0;
+	/*
+	 * If faulting access is expected, return EAGAIN to user-space.
+	 * It allows user-space to distinguish between a fault caused by
+	 * an access which is expect to fault (e.g. due to concurrent
+	 * unmapping of underlying memory) from an unexpected fault from
+	 * which a retry would not recover.
+	 */
+	if (ret == -EFAULT && expect_fault)
+		return -EAGAIN;
+	return ret;
+}
+
+/* Return 0 if same, > 0 if different, < 0 on error. */
+static int do_cpu_op_compare_iter(void __user *a, void __user *b, uint32_t len)
+{
+	char bufa[TMP_BUFLEN], bufb[TMP_BUFLEN];
+	uint32_t compared = 0;
+
+	while (compared != len) {
+		unsigned long to_compare;
+
+		to_compare = min_t(uint32_t, TMP_BUFLEN, len - compared);
+		if (__copy_from_user_inatomic(bufa, a + compared, to_compare))
+			return -EFAULT;
+		if (__copy_from_user_inatomic(bufb, b + compared, to_compare))
+			return -EFAULT;
+		if (memcmp(bufa, bufb, to_compare))
+			return 1;	/* different */
+		compared += to_compare;
+	}
+	return 0;	/* same */
+}
+
+/* Return 0 if same, > 0 if different, < 0 on error. */
+static int do_cpu_op_compare(void __user *a, void __user *b, uint32_t len)
+{
+	int ret = -EFAULT;
+	union {
+		uint8_t _u8;
+		uint16_t _u16;
+		uint32_t _u32;
+		uint64_t _u64;
+#if (BITS_PER_LONG < 64)
+		uint32_t _u64_split[2];
+#endif
+	} tmp[2];
+
+	pagefault_disable();
+	switch (len) {
+	case 1:
+		if (__get_user(tmp[0]._u8, (uint8_t __user *)a))
+			goto end;
+		if (__get_user(tmp[1]._u8, (uint8_t __user *)b))
+			goto end;
+		ret = !!(tmp[0]._u8 != tmp[1]._u8);
+		break;
+	case 2:
+		if (__get_user(tmp[0]._u16, (uint16_t __user *)a))
+			goto end;
+		if (__get_user(tmp[1]._u16, (uint16_t __user *)b))
+			goto end;
+		ret = !!(tmp[0]._u16 != tmp[1]._u16);
+		break;
+	case 4:
+		if (__get_user(tmp[0]._u32, (uint32_t __user *)a))
+			goto end;
+		if (__get_user(tmp[1]._u32, (uint32_t __user *)b))
+			goto end;
+		ret = !!(tmp[0]._u32 != tmp[1]._u32);
+		break;
+	case 8:
+#if (BITS_PER_LONG >= 64)
+		if (__get_user(tmp[0]._u64, (uint64_t __user *)a))
+			goto end;
+		if (__get_user(tmp[1]._u64, (uint64_t __user *)b))
+			goto end;
+#else
+		if (__get_user(tmp[0]._u64_split[0], (uint32_t __user *)a))
+			goto end;
+		if (__get_user(tmp[0]._u64_split[1], (uint32_t __user *)a + 1))
+			goto end;
+		if (__get_user(tmp[1]._u64_split[0], (uint32_t __user *)b))
+			goto end;
+		if (__get_user(tmp[1]._u64_split[1], (uint32_t __user *)b + 1))
+			goto end;
+#endif
+		ret = !!(tmp[0]._u64 != tmp[1]._u64);
+		break;
+	default:
+		pagefault_enable();
+		return do_cpu_op_compare_iter(a, b, len);
+	}
+end:
+	pagefault_enable();
+	return ret;
+}
+
+/* Return 0 on success, < 0 on error. */
+static int do_cpu_op_memcpy_iter(void __user *dst, void __user *src,
+		uint32_t len)
+{
+	char buf[TMP_BUFLEN];
+	uint32_t copied = 0;
+
+	while (copied != len) {
+		unsigned long to_copy;
+
+		to_copy = min_t(uint32_t, TMP_BUFLEN, len - copied);
+		if (__copy_from_user_inatomic(buf, src + copied, to_copy))
+			return -EFAULT;
+		if (__copy_to_user_inatomic(dst + copied, buf, to_copy))
+			return -EFAULT;
+		copied += to_copy;
+	}
+	return 0;
+}
+
+/* Return 0 on success, < 0 on error. */
+static int do_cpu_op_memcpy(void __user *dst, void __user *src, uint32_t len)
+{
+	int ret = -EFAULT;
+	union {
+		uint8_t _u8;
+		uint16_t _u16;
+		uint32_t _u32;
+		uint64_t _u64;
+#if (BITS_PER_LONG < 64)
+		uint32_t _u64_split[2];
+#endif
+	} tmp;
+
+	pagefault_disable();
+	switch (len) {
+	case 1:
+		if (__get_user(tmp._u8, (uint8_t __user *)src))
+			goto end;
+		if (__put_user(tmp._u8, (uint8_t __user *)dst))
+			goto end;
+		break;
+	case 2:
+		if (__get_user(tmp._u16, (uint16_t __user *)src))
+			goto end;
+		if (__put_user(tmp._u16, (uint16_t __user *)dst))
+			goto end;
+		break;
+	case 4:
+		if (__get_user(tmp._u32, (uint32_t __user *)src))
+			goto end;
+		if (__put_user(tmp._u32, (uint32_t __user *)dst))
+			goto end;
+		break;
+	case 8:
+#if (BITS_PER_LONG >= 64)
+		if (__get_user(tmp._u64, (uint64_t __user *)src))
+			goto end;
+		if (__put_user(tmp._u64, (uint64_t __user *)dst))
+			goto end;
+#else
+		if (__get_user(tmp._u64_split[0], (uint32_t __user *)src))
+			goto end;
+		if (__get_user(tmp._u64_split[1], (uint32_t __user *)src + 1))
+			goto end;
+		if (__put_user(tmp._u64_split[0], (uint32_t __user *)dst))
+			goto end;
+		if (__put_user(tmp._u64_split[1], (uint32_t __user *)dst + 1))
+			goto end;
+#endif
+		break;
+	default:
+		pagefault_enable();
+		return do_cpu_op_memcpy_iter(dst, src, len);
+	}
+	ret = 0;
+end:
+	pagefault_enable();
+	return ret;
+}
+
+static int op_add_fn(union op_fn_data *data, uint64_t count, uint32_t len)
+{
+	int ret = 0;
+
+	switch (len) {
+	case 1:
+		data->_u8 += (uint8_t)count;
+		break;
+	case 2:
+		data->_u16 += (uint16_t)count;
+		break;
+	case 4:
+		data->_u32 += (uint32_t)count;
+		break;
+	case 8:
+		data->_u64 += (uint64_t)count;
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+	return ret;
+}
+
+static int op_or_fn(union op_fn_data *data, uint64_t mask, uint32_t len)
+{
+	int ret = 0;
+
+	switch (len) {
+	case 1:
+		data->_u8 |= (uint8_t)mask;
+		break;
+	case 2:
+		data->_u16 |= (uint16_t)mask;
+		break;
+	case 4:
+		data->_u32 |= (uint32_t)mask;
+		break;
+	case 8:
+		data->_u64 |= (uint64_t)mask;
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+	return ret;
+}
+
+static int op_and_fn(union op_fn_data *data, uint64_t mask, uint32_t len)
+{
+	int ret = 0;
+
+	switch (len) {
+	case 1:
+		data->_u8 &= (uint8_t)mask;
+		break;
+	case 2:
+		data->_u16 &= (uint16_t)mask;
+		break;
+	case 4:
+		data->_u32 &= (uint32_t)mask;
+		break;
+	case 8:
+		data->_u64 &= (uint64_t)mask;
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+	return ret;
+}
+
+static int op_xor_fn(union op_fn_data *data, uint64_t mask, uint32_t len)
+{
+	int ret = 0;
+
+	switch (len) {
+	case 1:
+		data->_u8 ^= (uint8_t)mask;
+		break;
+	case 2:
+		data->_u16 ^= (uint16_t)mask;
+		break;
+	case 4:
+		data->_u32 ^= (uint32_t)mask;
+		break;
+	case 8:
+		data->_u64 ^= (uint64_t)mask;
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+	return ret;
+}
+
+static int op_lshift_fn(union op_fn_data *data, uint64_t bits, uint32_t len)
+{
+	int ret = 0;
+
+	switch (len) {
+	case 1:
+		data->_u8 <<= (uint8_t)bits;
+		break;
+	case 2:
+		data->_u16 <<= (uint16_t)bits;
+		break;
+	case 4:
+		data->_u32 <<= (uint32_t)bits;
+		break;
+	case 8:
+		data->_u64 <<= (uint64_t)bits;
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+	return ret;
+}
+
+static int op_rshift_fn(union op_fn_data *data, uint64_t bits, uint32_t len)
+{
+	int ret = 0;
+
+	switch (len) {
+	case 1:
+		data->_u8 >>= (uint8_t)bits;
+		break;
+	case 2:
+		data->_u16 >>= (uint16_t)bits;
+		break;
+	case 4:
+		data->_u32 >>= (uint32_t)bits;
+		break;
+	case 8:
+		data->_u64 >>= (uint64_t)bits;
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+	return ret;
+}
+
+/* Return 0 on success, < 0 on error. */
+static int do_cpu_op_fn(op_fn_t op_fn, void __user *p, uint64_t v,
+		uint32_t len)
+{
+	int ret = -EFAULT;
+	union op_fn_data tmp;
+
+	pagefault_disable();
+	switch (len) {
+	case 1:
+		if (__get_user(tmp._u8, (uint8_t __user *)p))
+			goto end;
+		if (op_fn(&tmp, v, len))
+			goto end;
+		if (__put_user(tmp._u8, (uint8_t __user *)p))
+			goto end;
+		break;
+	case 2:
+		if (__get_user(tmp._u16, (uint16_t __user *)p))
+			goto end;
+		if (op_fn(&tmp, v, len))
+			goto end;
+		if (__put_user(tmp._u16, (uint16_t __user *)p))
+			goto end;
+		break;
+	case 4:
+		if (__get_user(tmp._u32, (uint32_t __user *)p))
+			goto end;
+		if (op_fn(&tmp, v, len))
+			goto end;
+		if (__put_user(tmp._u32, (uint32_t __user *)p))
+			goto end;
+		break;
+	case 8:
+#if (BITS_PER_LONG >= 64)
+		if (__get_user(tmp._u64, (uint64_t __user *)p))
+			goto end;
+#else
+		if (__get_user(tmp._u64_split[0], (uint32_t __user *)p))
+			goto end;
+		if (__get_user(tmp._u64_split[1], (uint32_t __user *)p + 1))
+			goto end;
+#endif
+		if (op_fn(&tmp, v, len))
+			goto end;
+#if (BITS_PER_LONG >= 64)
+		if (__put_user(tmp._u64, (uint64_t __user *)p))
+			goto end;
+#else
+		if (__put_user(tmp._u64_split[0], (uint32_t __user *)p))
+			goto end;
+		if (__put_user(tmp._u64_split[1], (uint32_t __user *)p + 1))
+			goto end;
+#endif
+		break;
+	default:
+		ret = -EINVAL;
+		goto end;
+	}
+	ret = 0;
+end:
+	pagefault_enable();
+	return ret;
+}
+
+static int __do_cpu_opv(struct cpu_op *cpuop, int cpuopcnt)
+{
+	int i, ret;
+
+	for (i = 0; i < cpuopcnt; i++) {
+		struct cpu_op *op = &cpuop[i];
+
+		/* Guarantee a compiler barrier between each operation. */
+		barrier();
+
+		switch (op->op) {
+		case CPU_COMPARE_EQ_OP:
+			ret = do_cpu_op_compare(
+					(void __user *)op->u.compare_op.a,
+					(void __user *)op->u.compare_op.b,
+					op->len);
+			/* Stop execution on error. */
+			if (ret < 0)
+				return ret;
+			/*
+			 * Stop execution, return op index + 1 if comparison
+			 * differs.
+			 */
+			if (ret > 0)
+				return i + 1;
+			break;
+		case CPU_COMPARE_NE_OP:
+			ret = do_cpu_op_compare(
+					(void __user *)op->u.compare_op.a,
+					(void __user *)op->u.compare_op.b,
+					op->len);
+			/* Stop execution on error. */
+			if (ret < 0)
+				return ret;
+			/*
+			 * Stop execution, return op index + 1 if comparison
+			 * is identical.
+			 */
+			if (ret == 0)
+				return i + 1;
+			break;
+		case CPU_MEMCPY_OP:
+			ret = do_cpu_op_memcpy(
+					(void __user *)op->u.memcpy_op.dst,
+					(void __user *)op->u.memcpy_op.src,
+					op->len);
+			/* Stop execution on error. */
+			if (ret)
+				return ret;
+			break;
+		case CPU_ADD_OP:
+			ret = do_cpu_op_fn(op_add_fn,
+					(void __user *)op->u.arithmetic_op.p,
+					op->u.arithmetic_op.count, op->len);
+			/* Stop execution on error. */
+			if (ret)
+				return ret;
+			break;
+		case CPU_OR_OP:
+			ret = do_cpu_op_fn(op_or_fn,
+					(void __user *)op->u.bitwise_op.p,
+					op->u.bitwise_op.mask, op->len);
+			/* Stop execution on error. */
+			if (ret)
+				return ret;
+			break;
+		case CPU_AND_OP:
+			ret = do_cpu_op_fn(op_and_fn,
+					(void __user *)op->u.bitwise_op.p,
+					op->u.bitwise_op.mask, op->len);
+			/* Stop execution on error. */
+			if (ret)
+				return ret;
+			break;
+		case CPU_XOR_OP:
+			ret = do_cpu_op_fn(op_xor_fn,
+					(void __user *)op->u.bitwise_op.p,
+					op->u.bitwise_op.mask, op->len);
+			/* Stop execution on error. */
+			if (ret)
+				return ret;
+			break;
+		case CPU_LSHIFT_OP:
+			ret = do_cpu_op_fn(op_lshift_fn,
+					(void __user *)op->u.shift_op.p,
+					op->u.shift_op.bits, op->len);
+			/* Stop execution on error. */
+			if (ret)
+				return ret;
+			break;
+		case CPU_RSHIFT_OP:
+			ret = do_cpu_op_fn(op_rshift_fn,
+					(void __user *)op->u.shift_op.p,
+					op->u.shift_op.bits, op->len);
+			/* Stop execution on error. */
+			if (ret)
+				return ret;
+			break;
+		case CPU_MB_OP:
+			smp_mb();
+			break;
+		default:
+			return -EINVAL;
+		}
+	}
+	return 0;
+}
+
+static int do_cpu_opv(struct cpu_op *cpuop, int cpuopcnt, int cpu)
+{
+	int ret;
+
+	if (cpu != raw_smp_processor_id()) {
+		ret = push_task_to_cpu(current, cpu);
+		if (ret)
+			goto check_online;
+	}
+	preempt_disable();
+	if (cpu != smp_processor_id()) {
+		ret = -EAGAIN;
+		goto end;
+	}
+	ret = __do_cpu_opv(cpuop, cpuopcnt);
+end:
+	preempt_enable();
+	return ret;
+
+check_online:
+	if (!cpu_possible(cpu))
+		return -EINVAL;
+	get_online_cpus();
+	if (cpu_online(cpu)) {
+		ret = -EAGAIN;
+		goto put_online_cpus;
+	}
+	/*
+	 * CPU is offline. Perform operation from the current CPU with
+	 * cpu_online read lock held, preventing that CPU from coming online,
+	 * and with mutex held, providing mutual exclusion against other
+	 * CPUs also finding out about an offline CPU.
+	 */
+	mutex_lock(&cpu_opv_offline_lock);
+	ret = __do_cpu_opv(cpuop, cpuopcnt);
+	mutex_unlock(&cpu_opv_offline_lock);
+put_online_cpus:
+	put_online_cpus();
+	return ret;
+}
+
+/*
+ * cpu_opv - execute operation vector on a given CPU with preempt off.
+ *
+ * Userspace should pass current CPU number as parameter. May fail with
+ * -EAGAIN if currently executing on the wrong CPU.
+ */
+SYSCALL_DEFINE4(cpu_opv, struct cpu_op __user *, ucpuopv, int, cpuopcnt,
+		int, cpu, int, flags)
+{
+	struct cpu_op cpuopv[CPU_OP_VEC_LEN_MAX];
+	struct page *pinned_pages_on_stack[NR_PINNED_PAGES_ON_STACK];
+	struct page **pinned_pages = pinned_pages_on_stack;
+	int ret, i;
+	size_t nr_pinned = 0;
+
+	if (unlikely(flags))
+		return -EINVAL;
+	if (unlikely(cpu < 0))
+		return -EINVAL;
+	if (cpuopcnt < 0 || cpuopcnt > CPU_OP_VEC_LEN_MAX)
+		return -EINVAL;
+	if (copy_from_user(cpuopv, ucpuopv, cpuopcnt * sizeof(struct cpu_op)))
+		return -EFAULT;
+	ret = cpu_opv_check(cpuopv, cpuopcnt);
+	if (ret)
+		return ret;
+	ret = cpu_opv_pin_pages(cpuopv, cpuopcnt,
+				&pinned_pages, &nr_pinned);
+	if (ret)
+		goto end;
+	ret = do_cpu_opv(cpuopv, cpuopcnt, cpu);
+	for (i = 0; i < nr_pinned; i++)
+		put_page(pinned_pages[i]);
+end:
+	if (pinned_pages != pinned_pages_on_stack)
+		kfree(pinned_pages);
+	return ret;
+}
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 6bba05f47e51..e547f93a46c2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1052,6 +1052,43 @@ void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
 		set_curr_task(rq, p);
 }
 
+int push_task_to_cpu(struct task_struct *p, unsigned int dest_cpu)
+{
+	struct rq_flags rf;
+	struct rq *rq;
+	int ret = 0;
+
+	rq = task_rq_lock(p, &rf);
+	update_rq_clock(rq);
+
+	if (!cpumask_test_cpu(dest_cpu, &p->cpus_allowed)) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (task_cpu(p) == dest_cpu)
+		goto out;
+
+	if (task_running(rq, p) || p->state == TASK_WAKING) {
+		struct migration_arg arg = { p, dest_cpu };
+		/* Need help from migration thread: drop lock and wait. */
+		task_rq_unlock(rq, p, &rf);
+		stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
+		tlb_migrate_finish(p->mm);
+		return 0;
+	} else if (task_on_rq_queued(p)) {
+		/*
+		 * OK, since we're going to drop the lock immediately
+		 * afterwards anyway.
+		 */
+		rq = move_queued_task(rq, &rf, p, dest_cpu);
+	}
+out:
+	task_rq_unlock(rq, p, &rf);
+
+	return ret;
+}
+
 /*
  * Change a given task's CPU affinity. Migrate the thread to a
  * proper CPU and schedule it away if the CPU it's executing on
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 3b448ba82225..cab256c1720a 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1209,6 +1209,8 @@ static inline void __set_task_cpu(struct task_struct *p, unsigned int cpu)
 #endif
 }
 
+int push_task_to_cpu(struct task_struct *p, unsigned int dest_cpu);
+
 /*
  * Tunables that become constants when CONFIG_SCHED_DEBUG is off:
  */
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index bfa1ee1bf669..59e622296dc3 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -262,3 +262,4 @@ cond_syscall(sys_pkey_free);
 
 /* restartable sequence */
 cond_syscall(sys_rseq);
+cond_syscall(sys_cpu_opv);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH for 4.15 09/14] cpu_opv: Wire up x86 32/64 system call
@ 2017-11-06 20:56   ` Mathieu Desnoyers
  0 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2017-11-06 20:56 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul Turner <pjt@google.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Andrew Hunter <ahh@google.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Michael Kerrisk <mtk.manpages@gmail.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: linux-api@vger.kernel.org
---
 arch/x86/entry/syscalls/syscall_32.tbl | 1 +
 arch/x86/entry/syscalls/syscall_64.tbl | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index ba43ee75e425..afc6988fb2c8 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -392,3 +392,4 @@
 383	i386	statx			sys_statx
 384	i386	arch_prctl		sys_arch_prctl			compat_sys_arch_prctl
 385	i386	rseq			sys_rseq
+386	i386	cpu_opv			sys_cpu_opv
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 3ad03495bbb9..ab5d1f9f9396 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -340,6 +340,7 @@
 331	common	pkey_free		sys_pkey_free
 332	common	statx			sys_statx
 333	common	rseq			sys_rseq
+334	common	cpu_opv			sys_cpu_opv
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH for 4.15 09/14] cpu_opv: Wire up x86 32/64 system call
@ 2017-11-06 20:56   ` Mathieu Desnoyers
  0 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2017-11-06 20:56 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
CC: "Paul E. McKenney" <paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
CC: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
CC: Paul Turner <pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
CC: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
CC: Andrew Hunter <ahh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
CC: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
CC: Andi Kleen <andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org>
CC: Dave Watson <davejwatson-b10kYP2dOMg@public.gmane.org>
CC: Chris Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
CC: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
CC: "H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
CC: Ben Maurer <bmaurer-b10kYP2dOMg@public.gmane.org>
CC: Steven Rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org>
CC: Josh Triplett <josh-iaAMLnmF4UmaiuxdJuQwMA@public.gmane.org>
CC: Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
CC: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
CC: Russell King <linux-lFZ/pmaqli7XmaaqVzeoHQ@public.gmane.org>
CC: Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>
CC: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
CC: Michael Kerrisk <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
CC: Boqun Feng <boqun.feng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
CC: linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
---
 arch/x86/entry/syscalls/syscall_32.tbl | 1 +
 arch/x86/entry/syscalls/syscall_64.tbl | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index ba43ee75e425..afc6988fb2c8 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -392,3 +392,4 @@
 383	i386	statx			sys_statx
 384	i386	arch_prctl		sys_arch_prctl			compat_sys_arch_prctl
 385	i386	rseq			sys_rseq
+386	i386	cpu_opv			sys_cpu_opv
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 3ad03495bbb9..ab5d1f9f9396 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -340,6 +340,7 @@
 331	common	pkey_free		sys_pkey_free
 332	common	statx			sys_statx
 333	common	rseq			sys_rseq
+334	common	cpu_opv			sys_cpu_opv
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH for 4.15 10/14] cpu_opv: Wire up powerpc system call
@ 2017-11-06 20:56   ` Mathieu Desnoyers
  0 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2017-11-06 20:56 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linuxppc-dev

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/include/asm/systbl.h      | 1 +
 arch/powerpc/include/asm/unistd.h      | 2 +-
 arch/powerpc/include/uapi/asm/unistd.h | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/systbl.h b/arch/powerpc/include/asm/systbl.h
index 964321a5799c..f9cdb896fbaa 100644
--- a/arch/powerpc/include/asm/systbl.h
+++ b/arch/powerpc/include/asm/systbl.h
@@ -390,3 +390,4 @@ COMPAT_SYS_SPU(pwritev2)
 SYSCALL(kexec_file_load)
 SYSCALL(statx)
 SYSCALL(rseq)
+SYSCALL(cpu_opv)
diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h
index e76bd5601ea4..48f80f452e31 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -12,7 +12,7 @@
 #include <uapi/asm/unistd.h>
 
 
-#define NR_syscalls		385
+#define NR_syscalls		386
 
 #define __NR__exit __NR_exit
 
diff --git a/arch/powerpc/include/uapi/asm/unistd.h b/arch/powerpc/include/uapi/asm/unistd.h
index b1980fcd56d5..972a7d68c143 100644
--- a/arch/powerpc/include/uapi/asm/unistd.h
+++ b/arch/powerpc/include/uapi/asm/unistd.h
@@ -396,5 +396,6 @@
 #define __NR_kexec_file_load	382
 #define __NR_statx		383
 #define __NR_rseq		384
+#define __NR_cpu_opv		385
 
 #endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH for 4.15 10/14] cpu_opv: Wire up powerpc system call
@ 2017-11-06 20:56   ` Mathieu Desnoyers
  0 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2017-11-06 20:56 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers,
	Benjamin Herrenschmidt

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
CC: Benjamin Herrenschmidt <benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r@public.gmane.org>
CC: Paul Mackerras <paulus-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org>
CC: Michael Ellerman <mpe-Gsx/Oe8HsFggBc27wqDAHg@public.gmane.org>
CC: Boqun Feng <boqun.feng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
CC: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
CC: "Paul E. McKenney" <paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
CC: linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org
---
 arch/powerpc/include/asm/systbl.h      | 1 +
 arch/powerpc/include/asm/unistd.h      | 2 +-
 arch/powerpc/include/uapi/asm/unistd.h | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/systbl.h b/arch/powerpc/include/asm/systbl.h
index 964321a5799c..f9cdb896fbaa 100644
--- a/arch/powerpc/include/asm/systbl.h
+++ b/arch/powerpc/include/asm/systbl.h
@@ -390,3 +390,4 @@ COMPAT_SYS_SPU(pwritev2)
 SYSCALL(kexec_file_load)
 SYSCALL(statx)
 SYSCALL(rseq)
+SYSCALL(cpu_opv)
diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h
index e76bd5601ea4..48f80f452e31 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -12,7 +12,7 @@
 #include <uapi/asm/unistd.h>
 
 
-#define NR_syscalls		385
+#define NR_syscalls		386
 
 #define __NR__exit __NR_exit
 
diff --git a/arch/powerpc/include/uapi/asm/unistd.h b/arch/powerpc/include/uapi/asm/unistd.h
index b1980fcd56d5..972a7d68c143 100644
--- a/arch/powerpc/include/uapi/asm/unistd.h
+++ b/arch/powerpc/include/uapi/asm/unistd.h
@@ -396,5 +396,6 @@
 #define __NR_kexec_file_load	382
 #define __NR_statx		383
 #define __NR_rseq		384
+#define __NR_cpu_opv		385
 
 #endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH for 4.15 11/14] cpu_opv: Wire up ARM32 system call
  2017-11-06 20:56 ` Mathieu Desnoyers
                   ` (10 preceding siblings ...)
  (?)
@ 2017-11-06 20:56 ` Mathieu Desnoyers
  -1 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2017-11-06 20:56 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Paul Turner <pjt@google.com>
CC: Andrew Hunter <ahh@google.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: linux-api@vger.kernel.org
---
 arch/arm/tools/syscall.tbl | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index fbc74b5fa3ed..213ccfc2c437 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -413,3 +413,4 @@
 396	common	pkey_free		sys_pkey_free
 397	common	statx			sys_statx
 398	common	rseq			sys_rseq
+399	common	cpu_opv			sys_cpu_opv
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH v2 for 4.15 12/14] cpu_opv: Implement selftests
  2017-11-06 20:56 ` Mathieu Desnoyers
                   ` (11 preceding siblings ...)
  (?)
@ 2017-11-06 20:56 ` Mathieu Desnoyers
  -1 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2017-11-06 20:56 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers, Shuah Khan,
	linux-kselftest

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Paul Turner <pjt@google.com>
CC: Andrew Hunter <ahh@google.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Shuah Khan <shuah@kernel.org>
CC: linux-kselftest@vger.kernel.org
CC: linux-api@vger.kernel.org
---

Changes since v1:

- Expose similar library API as rseq:  Expose library API closely
  matching the rseq APIs, following removal of the event counter from
  the rseq kernel API.
---
 MAINTAINERS                                        |    1 +
 tools/testing/selftests/Makefile                   |    1 +
 tools/testing/selftests/cpu-opv/.gitignore         |    1 +
 tools/testing/selftests/cpu-opv/Makefile           |   15 +
 .../testing/selftests/cpu-opv/basic_cpu_opv_test.c | 1154 ++++++++++++++++++++
 tools/testing/selftests/cpu-opv/cpu-op.c           |  348 ++++++
 tools/testing/selftests/cpu-opv/cpu-op.h           |   68 ++
 7 files changed, 1588 insertions(+)
 create mode 100644 tools/testing/selftests/cpu-opv/.gitignore
 create mode 100644 tools/testing/selftests/cpu-opv/Makefile
 create mode 100644 tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
 create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.c
 create mode 100644 tools/testing/selftests/cpu-opv/cpu-op.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 6a428d6cf494..54e11f0569e0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3681,6 +3681,7 @@ L:	linux-kernel@vger.kernel.org
 S:	Supported
 F:	kernel/cpu_opv.c
 F:	include/uapi/linux/cpu_opv.h
+F:	tools/testing/selftests/cpu-opv/
 
 CRAMFS FILESYSTEM
 W:	http://sourceforge.net/projects/cramfs/
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 3c9c0bbe7dbb..c66e5e67cfab 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -4,6 +4,7 @@ TARGETS += breakpoints
 TARGETS += capabilities
 TARGETS += cpufreq
 TARGETS += cpu-hotplug
+TARGETS += cpu-opv
 TARGETS += efivarfs
 TARGETS += exec
 TARGETS += firmware
diff --git a/tools/testing/selftests/cpu-opv/.gitignore b/tools/testing/selftests/cpu-opv/.gitignore
new file mode 100644
index 000000000000..c7186eb95cf5
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/.gitignore
@@ -0,0 +1 @@
+basic_cpu_opv_test
diff --git a/tools/testing/selftests/cpu-opv/Makefile b/tools/testing/selftests/cpu-opv/Makefile
new file mode 100644
index 000000000000..afb02f301567
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/Makefile
@@ -0,0 +1,15 @@
+CFLAGS += -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./
+
+TEST_GEN_PROGS = basic_cpu_opv_test libcpu-op.so
+
+ALL: $(TEST_GEN_PROGS)
+
+libcpu-op.so: cpu-op.c cpu-op.h
+	$(CC) $(CFLAGS) -shared -fPIC $< -o $@
+
+# Own recipe because we only want to build against 1st prerequisite, but
+# still track changes to header files.
+%: %.c libcpu-op.so cpu-op.h
+	$(CC) $(CFLAGS) $< -lcpu-op -o $@
+
+include ../lib.mk
diff --git a/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c b/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
new file mode 100644
index 000000000000..23072dcf5612
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
@@ -0,0 +1,1154 @@
+/*
+ * Basic test coverage for cpu_opv system call.
+ */
+
+#define _GNU_SOURCE
+#include <assert.h>
+#include <sched.h>
+#include <signal.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/time.h>
+#include <errno.h>
+#include <stdlib.h>
+
+#include "cpu-op.h"
+
+#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
+
+#define TESTBUFLEN	4096
+#define TESTBUFLEN_CMP	16
+
+static int test_compare_eq_op(char *a, char *b, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = len,
+			CPU_OP_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
+			CPU_OP_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_compare_eq_same(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_compare_eq same";
+
+	printf("Testing %s\n", test_name);
+
+	/* Test compare_eq */
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf2[i] = (char)i;
+	ret = test_compare_eq_op(buf2, buf1, TESTBUFLEN);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret > 0) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 0);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_compare_eq_diff(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_compare_eq different";
+
+	printf("Testing %s\n", test_name);
+
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN);
+	ret = test_compare_eq_op(buf2, buf1, TESTBUFLEN);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret == 0) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 1);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_compare_ne_op(char *a, char *b, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_NE_OP,
+			.len = len,
+			CPU_OP_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
+			CPU_OP_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_compare_ne_same(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_compare_ne same";
+
+	printf("Testing %s\n", test_name);
+
+	/* Test compare_ne */
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf2[i] = (char)i;
+	ret = test_compare_ne_op(buf2, buf1, TESTBUFLEN);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret == 0) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 1);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_compare_ne_diff(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_compare_ne different";
+
+	printf("Testing %s\n", test_name);
+
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN);
+	ret = test_compare_ne_op(buf2, buf1, TESTBUFLEN);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret != 0) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 0);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_2compare_eq_op(char *a, char *b, char *c, char *d,
+		size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = len,
+			CPU_OP_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
+			CPU_OP_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = len,
+			CPU_OP_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, c),
+			CPU_OP_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, d),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_2compare_eq_index(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN_CMP];
+	char buf2[TESTBUFLEN_CMP];
+	char buf3[TESTBUFLEN_CMP];
+	char buf4[TESTBUFLEN_CMP];
+	const char *test_name = "test_2compare_eq index";
+
+	printf("Testing %s\n", test_name);
+
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN_CMP);
+	memset(buf3, 0, TESTBUFLEN_CMP);
+	memset(buf4, 0, TESTBUFLEN_CMP);
+
+	/* First compare failure is op[0], expect 1. */
+	ret = test_2compare_eq_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret != 1) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 1);
+		return -1;
+	}
+
+	/* All compares succeed. */
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf2[i] = (char)i;
+	ret = test_2compare_eq_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret != 0) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 0);
+		return -1;
+	}
+
+	/* First compare failure is op[1], expect 2. */
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf3[i] = (char)i;
+	ret = test_2compare_eq_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret != 2) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 2);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int test_2compare_ne_op(char *a, char *b, char *c, char *d,
+		size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_NE_OP,
+			.len = len,
+			CPU_OP_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, a),
+			CPU_OP_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, b),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_COMPARE_NE_OP,
+			.len = len,
+			CPU_OP_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.a, c),
+			CPU_OP_FIELD_u32_u64_INIT_ONSTACK(.u.compare_op.b, d),
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_2compare_ne_index(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN_CMP];
+	char buf2[TESTBUFLEN_CMP];
+	char buf3[TESTBUFLEN_CMP];
+	char buf4[TESTBUFLEN_CMP];
+	const char *test_name = "test_2compare_ne index";
+
+	printf("Testing %s\n", test_name);
+
+	memset(buf1, 0, TESTBUFLEN_CMP);
+	memset(buf2, 0, TESTBUFLEN_CMP);
+	memset(buf3, 0, TESTBUFLEN_CMP);
+	memset(buf4, 0, TESTBUFLEN_CMP);
+
+	/* First compare ne failure is op[0], expect 1. */
+	ret = test_2compare_ne_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret != 1) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 1);
+		return -1;
+	}
+
+	/* All compare ne succeed. */
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf1[i] = (char)i;
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf3[i] = (char)i;
+	ret = test_2compare_ne_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret != 0) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 0);
+		return -1;
+	}
+
+	/* First compare failure is op[1], expect 2. */
+	for (i = 0; i < TESTBUFLEN_CMP; i++)
+		buf4[i] = (char)i;
+	ret = test_2compare_ne_op(buf2, buf1, buf4, buf3, TESTBUFLEN_CMP);
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret != 2) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 2);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int test_memcpy_op(void *dst, void *src, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			CPU_OP_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
+			CPU_OP_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_memcpy(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_memcpy";
+
+	printf("Testing %s\n", test_name);
+
+	/* Test memcpy */
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN);
+	ret = test_memcpy_op(buf2, buf1, TESTBUFLEN);
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	for (i = 0; i < TESTBUFLEN; i++) {
+		if (buf2[i] != (char)i) {
+			printf("%s failed. Expecting '%d', found '%d' at offset %d\n",
+				test_name, (char)i, buf2[i], i);
+			return -1;
+		}
+	}
+	return 0;
+}
+
+static int test_memcpy_u32(void)
+{
+	int ret;
+	uint32_t v1, v2;
+	const char *test_name = "test_memcpy_u32";
+
+	printf("Testing %s\n", test_name);
+
+	/* Test memcpy_u32 */
+	v1 = 42;
+	v2 = 0;
+	ret = test_memcpy_op(&v2, &v1, sizeof(v1));
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (v1 != v2) {
+		printf("%s failed. Expecting '%d', found '%d'\n",
+			test_name, v1, v2);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_memcpy_mb_memcpy_op(void *dst1, void *src1,
+		void *dst2, void *src2, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			CPU_OP_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst1),
+			CPU_OP_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src1),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[1] = {
+			.op = CPU_MB_OP,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			CPU_OP_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst2),
+			CPU_OP_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src2),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_memcpy_mb_memcpy(void)
+{
+	int ret;
+	int v1, v2, v3;
+	const char *test_name = "test_memcpy_mb_memcpy";
+
+	printf("Testing %s\n", test_name);
+
+	/* Test memcpy */
+	v1 = 42;
+	v2 = v3 = 0;
+	ret = test_memcpy_mb_memcpy_op(&v2, &v1, &v3, &v2, sizeof(int));
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (v3 != v1) {
+		printf("%s failed. Expecting '%d', found '%d'\n",
+			test_name, v1, v3);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_add_op(int *v, int64_t increment)
+{
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_op_add(v, increment, sizeof(*v), cpu);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_add(void)
+{
+	int orig_v = 42, v, ret;
+	int increment = 1;
+	const char *test_name = "test_add";
+
+	printf("Testing %s\n", test_name);
+
+	v = orig_v;
+	ret = test_add_op(&v, increment);
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != orig_v + increment) {
+		printf("%s unexpected value: %d. Should be %d.\n",
+			test_name, v, orig_v);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_two_add_op(int *v, int64_t *increments)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_ADD_OP,
+			.len = sizeof(*v),
+			CPU_OP_FIELD_u32_u64_INIT_ONSTACK(
+				.u.arithmetic_op.p, v),
+			.u.arithmetic_op.count = increments[0],
+			.u.arithmetic_op.expect_fault_p = 0,
+		},
+		[1] = {
+			.op = CPU_ADD_OP,
+			.len = sizeof(*v),
+			CPU_OP_FIELD_u32_u64_INIT_ONSTACK(
+				.u.arithmetic_op.p, v),
+			.u.arithmetic_op.count = increments[1],
+			.u.arithmetic_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_two_add(void)
+{
+	int orig_v = 42, v, ret;
+	int64_t increments[2] = { 99, 123 };
+	const char *test_name = "test_two_add";
+
+	printf("Testing %s\n", test_name);
+
+	v = orig_v;
+	ret = test_two_add_op(&v, increments);
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != orig_v + increments[0] + increments[1]) {
+		printf("%s unexpected value: %d. Should be %d.\n",
+			test_name, v, orig_v);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_or_op(int *v, uint64_t mask)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_OR_OP,
+			.len = sizeof(*v),
+			CPU_OP_FIELD_u32_u64_INIT_ONSTACK(
+				.u.bitwise_op.p, v),
+			.u.bitwise_op.mask = mask,
+			.u.bitwise_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_or(void)
+{
+	int orig_v = 0xFF00000, v, ret;
+	uint32_t mask = 0xFFF;
+	const char *test_name = "test_or";
+
+	printf("Testing %s\n", test_name);
+
+	v = orig_v;
+	ret = test_or_op(&v, mask);
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v | mask)) {
+		printf("%s unexpected value: %d. Should be %d.\n",
+			test_name, v, orig_v | mask);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_and_op(int *v, uint64_t mask)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_AND_OP,
+			.len = sizeof(*v),
+			CPU_OP_FIELD_u32_u64_INIT_ONSTACK(
+				.u.bitwise_op.p, v),
+			.u.bitwise_op.mask = mask,
+			.u.bitwise_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_and(void)
+{
+	int orig_v = 0xF00, v, ret;
+	uint32_t mask = 0xFFF;
+	const char *test_name = "test_and";
+
+	printf("Testing %s\n", test_name);
+
+	v = orig_v;
+	ret = test_and_op(&v, mask);
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v & mask)) {
+		printf("%s unexpected value: %d. Should be %d.\n",
+			test_name, v, orig_v & mask);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_xor_op(int *v, uint64_t mask)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_XOR_OP,
+			.len = sizeof(*v),
+			CPU_OP_FIELD_u32_u64_INIT_ONSTACK(
+				.u.bitwise_op.p, v),
+			.u.bitwise_op.mask = mask,
+			.u.bitwise_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_xor(void)
+{
+	int orig_v = 0xF00, v, ret;
+	uint32_t mask = 0xFFF;
+	const char *test_name = "test_xor";
+
+	printf("Testing %s\n", test_name);
+
+	v = orig_v;
+	ret = test_xor_op(&v, mask);
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v ^ mask)) {
+		printf("%s unexpected value: %d. Should be %d.\n",
+			test_name, v, orig_v ^ mask);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_lshift_op(int *v, uint32_t bits)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_LSHIFT_OP,
+			.len = sizeof(*v),
+			CPU_OP_FIELD_u32_u64_INIT_ONSTACK(
+				.u.shift_op.p, v),
+			.u.shift_op.bits = bits,
+			.u.shift_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_lshift(void)
+{
+	int orig_v = 0xF00, v, ret;
+	uint32_t bits = 5;
+	const char *test_name = "test_lshift";
+
+	printf("Testing %s\n", test_name);
+
+	v = orig_v;
+	ret = test_lshift_op(&v, bits);
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v << bits)) {
+		printf("%s unexpected value: %d. Should be %d.\n",
+			test_name, v, orig_v << bits);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_rshift_op(int *v, uint32_t bits)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_RSHIFT_OP,
+			.len = sizeof(*v),
+			CPU_OP_FIELD_u32_u64_INIT_ONSTACK(
+				.u.shift_op.p, v),
+			.u.shift_op.bits = bits,
+			.u.shift_op.expect_fault_p = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_rshift(void)
+{
+	int orig_v = 0xF00, v, ret;
+	uint32_t bits = 5;
+	const char *test_name = "test_rshift";
+
+	printf("Testing %s\n", test_name);
+
+	v = orig_v;
+	ret = test_rshift_op(&v, bits);
+	if (ret) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		return -1;
+	}
+	if (v != (orig_v >> bits)) {
+		printf("%s unexpected value: %d. Should be %d.\n",
+			test_name, v, orig_v >> bits);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_cmpxchg_op(void *v, void *expect, void *old, void *n,
+		size_t len)
+{
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_op_cmpxchg(v, expect, old, n, len, cpu);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+
+static int test_cmpxchg_success(void)
+{
+	int ret;
+	uint64_t orig_v = 1, v, expect = 1, old = 0, n = 3;
+	const char *test_name = "test_cmpxchg success";
+
+	printf("Testing %s\n", test_name);
+
+	v = orig_v;
+	ret = test_cmpxchg_op(&v, &expect, &old, &n, sizeof(uint64_t));
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 0);
+		return -1;
+	}
+	if (v != n) {
+		printf("%s v is %lld, expecting %lld\n",
+			test_name, (long long)v, (long long)n);
+		return -1;
+	}
+	if (old != orig_v) {
+		printf("%s old is %lld, expecting %lld\n",
+			test_name, (long long)old, (long long)orig_v);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_cmpxchg_fail(void)
+{
+	int ret;
+	uint64_t orig_v = 1, v, expect = 123, old = 0, n = 3;
+	const char *test_name = "test_cmpxchg fail";
+
+	printf("Testing %s\n", test_name);
+
+	v = orig_v;
+	ret = test_cmpxchg_op(&v, &expect, &old, &n, sizeof(uint64_t));
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	if (ret == 0) {
+		printf("%s returned %d, expecting %d\n",
+			test_name, ret, 1);
+		return -1;
+	}
+	if (v == n) {
+		printf("%s v is %lld, expecting %lld\n",
+			test_name, (long long)v, (long long)orig_v);
+		return -1;
+	}
+	if (old != orig_v) {
+		printf("%s old is %lld, expecting %lld\n",
+			test_name, (long long)old, (long long)orig_v);
+		return -1;
+	}
+	return 0;
+}
+
+static int test_memcpy_expect_fault_op(void *dst, void *src, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			CPU_OP_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
+			CPU_OP_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
+			.u.memcpy_op.expect_fault_dst = 0,
+			/* Return EAGAIN on fault. */
+			.u.memcpy_op.expect_fault_src = 1,
+		},
+	};
+	int cpu;
+
+	cpu = cpu_op_get_current_cpu();
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int test_memcpy_fault(void)
+{
+	int ret;
+	char buf1[TESTBUFLEN];
+	const char *test_name = "test_memcpy_fault";
+
+	printf("Testing %s\n", test_name);
+
+	/* Test memcpy */
+	ret = test_memcpy_op(buf1, NULL, TESTBUFLEN);
+	if (!ret || (ret < 0 && errno != EFAULT)) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	/* Test memcpy expect fault */
+	ret = test_memcpy_expect_fault_op(buf1, NULL, TESTBUFLEN);
+	if (!ret || (ret < 0 && errno != EAGAIN)) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	return 0;
+}
+
+static int do_test_unknown_op(void)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = -1,	/* Unknown */
+			.len = 0,
+		},
+	};
+	int cpu;
+
+	cpu = cpu_op_get_current_cpu();
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int test_unknown_op(void)
+{
+	int ret;
+	const char *test_name = "test_unknown_op";
+
+	printf("Testing %s\n", test_name);
+
+	ret = do_test_unknown_op();
+	if (!ret || (ret < 0 && errno != EINVAL)) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	return 0;
+}
+
+static int do_test_max_ops(void)
+{
+	struct cpu_op opvec[] = {
+		[0] = { .op = CPU_MB_OP, },
+		[1] = { .op = CPU_MB_OP, },
+		[2] = { .op = CPU_MB_OP, },
+		[3] = { .op = CPU_MB_OP, },
+		[4] = { .op = CPU_MB_OP, },
+		[5] = { .op = CPU_MB_OP, },
+		[6] = { .op = CPU_MB_OP, },
+		[7] = { .op = CPU_MB_OP, },
+		[8] = { .op = CPU_MB_OP, },
+		[9] = { .op = CPU_MB_OP, },
+		[10] = { .op = CPU_MB_OP, },
+		[11] = { .op = CPU_MB_OP, },
+		[12] = { .op = CPU_MB_OP, },
+		[13] = { .op = CPU_MB_OP, },
+		[14] = { .op = CPU_MB_OP, },
+		[15] = { .op = CPU_MB_OP, },
+	};
+	int cpu;
+
+	cpu = cpu_op_get_current_cpu();
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int test_max_ops(void)
+{
+	int ret;
+	const char *test_name = "test_max_ops";
+
+	printf("Testing %s\n", test_name);
+
+	ret = do_test_max_ops();
+	if (ret < 0) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	return 0;
+}
+
+static int do_test_too_many_ops(void)
+{
+	struct cpu_op opvec[] = {
+		[0] = { .op = CPU_MB_OP, },
+		[1] = { .op = CPU_MB_OP, },
+		[2] = { .op = CPU_MB_OP, },
+		[3] = { .op = CPU_MB_OP, },
+		[4] = { .op = CPU_MB_OP, },
+		[5] = { .op = CPU_MB_OP, },
+		[6] = { .op = CPU_MB_OP, },
+		[7] = { .op = CPU_MB_OP, },
+		[8] = { .op = CPU_MB_OP, },
+		[9] = { .op = CPU_MB_OP, },
+		[10] = { .op = CPU_MB_OP, },
+		[11] = { .op = CPU_MB_OP, },
+		[12] = { .op = CPU_MB_OP, },
+		[13] = { .op = CPU_MB_OP, },
+		[14] = { .op = CPU_MB_OP, },
+		[15] = { .op = CPU_MB_OP, },
+		[16] = { .op = CPU_MB_OP, },
+	};
+	int cpu;
+
+	cpu = cpu_op_get_current_cpu();
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int test_too_many_ops(void)
+{
+	int ret;
+	const char *test_name = "test_too_many_ops";
+
+	printf("Testing %s\n", test_name);
+
+	ret = do_test_too_many_ops();
+	if (!ret || (ret < 0 && errno != EINVAL)) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	return 0;
+}
+
+static int test_memcpy_single_too_large(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN + 1];
+	char buf2[TESTBUFLEN + 1];
+	const char *test_name = "test_memcpy_single_too_large";
+
+	printf("Testing %s\n", test_name);
+
+	/* Test memcpy */
+	for (i = 0; i < TESTBUFLEN + 1; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN + 1);
+	ret = test_memcpy_op(buf2, buf1, TESTBUFLEN + 1);
+	if (!ret || (ret < 0 && errno != EINVAL)) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	return 0;
+}
+
+static int test_memcpy_single_ok_sum_too_large_op(void *dst, void *src, size_t len)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			CPU_OP_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
+			CPU_OP_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			CPU_OP_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.dst, dst),
+			CPU_OP_FIELD_u32_u64_INIT_ONSTACK(.u.memcpy_op.src, src),
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+	int ret, cpu;
+
+	do {
+		cpu = cpu_op_get_current_cpu();
+		ret = cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+	} while (ret == -1 && errno == EAGAIN);
+
+	return ret;
+}
+
+static int test_memcpy_single_ok_sum_too_large(void)
+{
+	int i, ret;
+	char buf1[TESTBUFLEN];
+	char buf2[TESTBUFLEN];
+	const char *test_name = "test_memcpy_single_ok_sum_too_large";
+
+	printf("Testing %s\n", test_name);
+
+	/* Test memcpy */
+	for (i = 0; i < TESTBUFLEN; i++)
+		buf1[i] = (char)i;
+	memset(buf2, 0, TESTBUFLEN);
+	ret = test_memcpy_single_ok_sum_too_large_op(buf2, buf1, TESTBUFLEN);
+	if (!ret || (ret < 0 && errno != EINVAL)) {
+		printf("%s returned with %d, errno: %s\n",
+			test_name, ret, strerror(errno));
+		exit(-1);
+	}
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	int ret = 0;
+
+	ret |= test_compare_eq_same();
+	ret |= test_compare_eq_diff();
+	ret |= test_compare_ne_same();
+	ret |= test_compare_ne_diff();
+	ret |= test_2compare_eq_index();
+	ret |= test_2compare_ne_index();
+	ret |= test_memcpy();
+	ret |= test_memcpy_u32();
+	ret |= test_memcpy_mb_memcpy();
+	ret |= test_add();
+	ret |= test_two_add();
+	ret |= test_or();
+	ret |= test_and();
+	ret |= test_xor();
+	ret |= test_lshift();
+	ret |= test_rshift();
+	ret |= test_cmpxchg_success();
+	ret |= test_cmpxchg_fail();
+	ret |= test_memcpy_fault();
+	ret |= test_unknown_op();
+	ret |= test_max_ops();
+	ret |= test_too_many_ops();
+	ret |= test_memcpy_single_too_large();
+	ret |= test_memcpy_single_ok_sum_too_large();
+
+	return ret;
+}
diff --git a/tools/testing/selftests/cpu-opv/cpu-op.c b/tools/testing/selftests/cpu-opv/cpu-op.c
new file mode 100644
index 000000000000..d7ba481cca04
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/cpu-op.c
@@ -0,0 +1,348 @@
+/*
+ * cpu-op.c
+ *
+ * Copyright (C) 2017 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; only
+ * version 2.1 of the License.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <sched.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <syscall.h>
+#include <assert.h>
+#include <signal.h>
+
+#include "cpu-op.h"
+
+#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
+
+int cpu_opv(struct cpu_op *cpu_opv, int cpuopcnt, int cpu, int flags)
+{
+	return syscall(__NR_cpu_opv, cpu_opv, cpuopcnt, cpu, flags);
+}
+
+int cpu_op_get_current_cpu(void)
+{
+	int cpu;
+
+	cpu = sched_getcpu();
+	if (cpu < 0) {
+		perror("sched_getcpu()");
+		abort();
+	}
+	return cpu;
+}
+
+int cpu_op_cmpxchg(void *v, void *expect, void *old, void *n,
+		size_t len, int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			.u.memcpy_op.dst = (unsigned long)old,
+			.u.memcpy_op.src = (unsigned long)v,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[1] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = len,
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)n,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_add(void *v, int64_t count, size_t len, int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_ADD_OP,
+			.len = len,
+			.u.arithmetic_op.p = (unsigned long)v,
+			.u.arithmetic_op.count = count,
+			.u.arithmetic_op.expect_fault_p = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+static int cpu_op_cmpeqv_storep_expect_fault(intptr_t *v, intptr_t expect,
+		intptr_t *newp, int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)newp,
+			.u.memcpy_op.expect_fault_dst = 0,
+			/* Return EAGAIN on src fault. */
+			.u.memcpy_op.expect_fault_src = 1,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
+		off_t voffp, intptr_t *load, int cpu)
+{
+	intptr_t oldv = READ_ONCE(*v);
+	intptr_t *newp = (intptr_t *)(oldv + voffp);
+	int ret;
+
+	if (oldv == expectnot)
+		return 1;
+	ret = cpu_op_cmpeqv_storep_expect_fault(v, oldv, newp, cpu);
+	if (!ret) {
+		*load = oldv;
+		return 0;
+	}
+	if (ret > 0) {
+		errno = EAGAIN;
+		return -1;
+	}
+	return -1;
+}
+
+int cpu_op_cmpeqv_storev_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v2,
+			.u.memcpy_op.src = (unsigned long)&newv2,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_storev_mb_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v2,
+			.u.memcpy_op.src = (unsigned long)&newv2,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[2] = {
+			.op = CPU_MB_OP,
+		},
+		[3] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t expect2, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v2,
+			.u.compare_op.b = (unsigned long)&expect2,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_memcpy_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			.u.memcpy_op.dst = (unsigned long)dst,
+			.u.memcpy_op.src = (unsigned long)src,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[2] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_cmpeqv_memcpy_mb_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	struct cpu_op opvec[] = {
+		[0] = {
+			.op = CPU_COMPARE_EQ_OP,
+			.len = sizeof(intptr_t),
+			.u.compare_op.a = (unsigned long)v,
+			.u.compare_op.b = (unsigned long)&expect,
+			.u.compare_op.expect_fault_a = 0,
+			.u.compare_op.expect_fault_b = 0,
+		},
+		[1] = {
+			.op = CPU_MEMCPY_OP,
+			.len = len,
+			.u.memcpy_op.dst = (unsigned long)dst,
+			.u.memcpy_op.src = (unsigned long)src,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+		[2] = {
+			.op = CPU_MB_OP,
+		},
+		[3] = {
+			.op = CPU_MEMCPY_OP,
+			.len = sizeof(intptr_t),
+			.u.memcpy_op.dst = (unsigned long)v,
+			.u.memcpy_op.src = (unsigned long)&newv,
+			.u.memcpy_op.expect_fault_dst = 0,
+			.u.memcpy_op.expect_fault_src = 0,
+		},
+	};
+
+	return cpu_opv(opvec, ARRAY_SIZE(opvec), cpu, 0);
+}
+
+int cpu_op_addv(intptr_t *v, int64_t count, int cpu)
+{
+	return cpu_op_add(v, count, sizeof(intptr_t), cpu);
+}
diff --git a/tools/testing/selftests/cpu-opv/cpu-op.h b/tools/testing/selftests/cpu-opv/cpu-op.h
new file mode 100644
index 000000000000..ba2ec578ec50
--- /dev/null
+++ b/tools/testing/selftests/cpu-opv/cpu-op.h
@@ -0,0 +1,68 @@
+/*
+ * cpu-op.h
+ *
+ * (C) Copyright 2017 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef CPU_OPV_H
+#define CPU_OPV_H
+
+#include <stdlib.h>
+#include <stdint.h>
+#include <linux/cpu_opv.h>
+
+#define likely(x)		__builtin_expect(!!(x), 1)
+#define unlikely(x)		__builtin_expect(!!(x), 0)
+#define barrier()		__asm__ __volatile__("" : : : "memory")
+
+#define ACCESS_ONCE(x)		(*(__volatile__  __typeof__(x) *)&(x))
+#define WRITE_ONCE(x, v)	__extension__ ({ ACCESS_ONCE(x) = (v); })
+#define READ_ONCE(x)		ACCESS_ONCE(x)
+
+int cpu_opv(struct cpu_op *cpuopv, int cpuopcnt, int cpu, int flags);
+int cpu_op_get_current_cpu(void);
+
+int cpu_op_cmpxchg(void *v, void *expect, void *old, void *_new,
+		size_t len, int cpu);
+int cpu_op_add(void *v, int64_t count, size_t len, int cpu);
+
+int cpu_op_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
+		int cpu);
+int cpu_op_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
+		off_t voffp, intptr_t *load, int cpu);
+int cpu_op_cmpeqv_storev_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu);
+int cpu_op_cmpeqv_storev_mb_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu);
+int cpu_op_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t expect2, intptr_t newv,
+		int cpu);
+int cpu_op_cmpeqv_memcpy_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu);
+int cpu_op_cmpeqv_memcpy_mb_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu);
+int cpu_op_addv(intptr_t *v, int64_t count, int cpu);
+
+#endif  /* CPU_OPV_H_ */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH v2 for 4.15 13/14] Restartable sequences: Provide self-tests
@ 2017-11-06 20:56   ` Mathieu Desnoyers
  0 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2017-11-06 20:56 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers, Shuah Khan,
	linux-kselftest

Implements two basic tests of RSEQ functionality, and one more
exhaustive parameterizable test.

The first, "basic_test" only asserts that RSEQ works moderately
correctly. E.g. that the CPUID pointer works.

"basic_percpu_ops_test" is a slightly more "realistic" variant,
implementing a few simple per-cpu operations and testing their
correctness.

"param_test" is a parametrizable restartable sequences test. See
the "--help" output for usage.

A run_param_test.sh script runs many variants of the parametrizable
tests.

As part of those tests, a helper library "rseq" implements a user-space
API around restartable sequences. It uses the cpu_opv system call as
fallback when single-stepped by a debugger. It exposes the instruction
pointer addresses where the rseq assembly blocks begin and end, as well
as the associated abort instruction pointer, in the __rseq_table
section. This section allows debuggers may know where to place
breakpoints when single-stepping through assembly blocks which may be
aborted at any point by the kernel.

The rseq library expose APIs that present the fast-path operations.
The new from userspace is, e.g. for a counter increment:

    cpu = rseq_cpu_start();
    ret = rseq_addv(&data->c[cpu].count, 1, cpu);
    if (likely(!ret))
        return 0;        /* Success. */
    do {
        cpu = rseq_current_cpu();
        ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
        if (likely(!ret))
            return 0;    /* Success. */
    } while (ret > 0 || errno == EAGAIN);
    perror("cpu_op_addv");
    return -1;           /* Unexpected error. */

PowerPC tests have been implemented by Boqun Feng.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Paul Turner <pjt@google.com>
CC: Andrew Hunter <ahh@google.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Shuah Khan <shuah@kernel.org>
CC: linux-kselftest@vger.kernel.org
CC: linux-api@vger.kernel.org
---

Changes since v1:
- Provide abort-ip signature: The abort-ip signature is located just
  before the abort-ip target. It is currently hardcoded, but a
  user-space application could use the __rseq_table to iterate on all
  abort-ip targets and use a random value as signature if needed in the
  future.
- Add rseq_prepare_unload(): Libraries and JIT code using rseq critical
  sections need to issue rseq_prepare_unload() on each thread at least
  once before reclaim of struct rseq_cs.
- Use initial-exec TLS model, non-weak symbol: The initial-exec model is
  signal-safe, whereas the global-dynamic model is not.  Remove the
  "weak" symbol attribute from the __rseq_abi in rseq.c. The rseq.so
  library will have ownership of that symbol, and there is not reason for
  an application or user library to try to define that symbol.
  The expected use is to link against libreq.so, which owns and provide
  that symbol.
- Set cpu_id to -2 on register error
- Add rseq_len syscall parameter, rseq_cs version
- Ensure disassember-friendly signature: x86 32/64 disassembler have a
  hard time decoding the instruction stream after a bad instruction. Use
  a nopl instruction to encode the signature. Suggested by Andy Lutomirski.
- Exercise parametrized tests variants in a shell scripts.
- Restartable sequences selftests: Remove use of event counter.
- Use cpu_id_start field:  With the cpu_id_start field, the C
  preparation phase of the fast-path does not need to compare cpu_id < 0
  anymore.
- Signal-safe registration and refcounting: Allow libraries using
  librseq.so to register it from signal handlers.
---
 MAINTAINERS                                        |    1 +
 tools/testing/selftests/Makefile                   |    1 +
 .../testing/selftests/cpu-opv/basic_cpu_opv_test.c |   13 +-
 tools/testing/selftests/rseq/.gitignore            |    4 +
 tools/testing/selftests/rseq/Makefile              |   22 +
 .../testing/selftests/rseq/basic_percpu_ops_test.c |  333 +++++
 tools/testing/selftests/rseq/basic_test.c          |   55 +
 tools/testing/selftests/rseq/param_test.c          | 1285 ++++++++++++++++++++
 tools/testing/selftests/rseq/rseq-arm.h            |  535 ++++++++
 tools/testing/selftests/rseq/rseq-ppc.h            |  567 +++++++++
 tools/testing/selftests/rseq/rseq-x86.h            |  898 ++++++++++++++
 tools/testing/selftests/rseq/rseq.c                |  116 ++
 tools/testing/selftests/rseq/rseq.h                |  154 +++
 tools/testing/selftests/rseq/run_param_test.sh     |  124 ++
 14 files changed, 4103 insertions(+), 5 deletions(-)
 create mode 100644 tools/testing/selftests/rseq/.gitignore
 create mode 100644 tools/testing/selftests/rseq/Makefile
 create mode 100644 tools/testing/selftests/rseq/basic_percpu_ops_test.c
 create mode 100644 tools/testing/selftests/rseq/basic_test.c
 create mode 100644 tools/testing/selftests/rseq/param_test.c
 create mode 100644 tools/testing/selftests/rseq/rseq-arm.h
 create mode 100644 tools/testing/selftests/rseq/rseq-ppc.h
 create mode 100644 tools/testing/selftests/rseq/rseq-x86.h
 create mode 100644 tools/testing/selftests/rseq/rseq.c
 create mode 100644 tools/testing/selftests/rseq/rseq.h
 create mode 100755 tools/testing/selftests/rseq/run_param_test.sh

diff --git a/MAINTAINERS b/MAINTAINERS
index 54e11f0569e0..1022b5f51cd1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11502,6 +11502,7 @@ S:	Supported
 F:	kernel/rseq.c
 F:	include/uapi/linux/rseq.h
 F:	include/trace/events/rseq.h
+F:	tools/testing/selftests/rseq/
 
 RFKILL
 M:	Johannes Berg <johannes@sipsolutions.net>
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index c66e5e67cfab..b7fcd7bcb87e 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -25,6 +25,7 @@ TARGETS += nsfs
 TARGETS += powerpc
 TARGETS += pstore
 TARGETS += ptrace
+TARGETS += rseq
 TARGETS += seccomp
 TARGETS += sigaltstack
 TARGETS += size
diff --git a/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c b/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
index 23072dcf5612..6b624f1939ea 100644
--- a/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
+++ b/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
@@ -19,6 +19,8 @@
 #define TESTBUFLEN	4096
 #define TESTBUFLEN_CMP	16
 
+#define TESTBUFLEN_PAGE_MAX	65536
+
 static int test_compare_eq_op(char *a, char *b, size_t len)
 {
 	struct cpu_op opvec[] = {
@@ -1047,20 +1049,21 @@ static int test_too_many_ops(void)
 	return 0;
 }
 
+/* Use 64kB len, largest page size known on Linux. */
 static int test_memcpy_single_too_large(void)
 {
 	int i, ret;
-	char buf1[TESTBUFLEN + 1];
-	char buf2[TESTBUFLEN + 1];
+	char buf1[TESTBUFLEN_PAGE_MAX + 1];
+	char buf2[TESTBUFLEN_PAGE_MAX + 1];
 	const char *test_name = "test_memcpy_single_too_large";
 
 	printf("Testing %s\n", test_name);
 
 	/* Test memcpy */
-	for (i = 0; i < TESTBUFLEN + 1; i++)
+	for (i = 0; i < TESTBUFLEN_PAGE_MAX + 1; i++)
 		buf1[i] = (char)i;
-	memset(buf2, 0, TESTBUFLEN + 1);
-	ret = test_memcpy_op(buf2, buf1, TESTBUFLEN + 1);
+	memset(buf2, 0, TESTBUFLEN_PAGE_MAX + 1);
+	ret = test_memcpy_op(buf2, buf1, TESTBUFLEN_PAGE_MAX + 1);
 	if (!ret || (ret < 0 && errno != EINVAL)) {
 		printf("%s returned with %d, errno: %s\n",
 			test_name, ret, strerror(errno));
diff --git a/tools/testing/selftests/rseq/.gitignore b/tools/testing/selftests/rseq/.gitignore
new file mode 100644
index 000000000000..9409c3db99b2
--- /dev/null
+++ b/tools/testing/selftests/rseq/.gitignore
@@ -0,0 +1,4 @@
+basic_percpu_ops_test
+basic_test
+basic_rseq_op_test
+param_test
diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
new file mode 100644
index 000000000000..e9b0562dd450
--- /dev/null
+++ b/tools/testing/selftests/rseq/Makefile
@@ -0,0 +1,22 @@
+CFLAGS += -O2 -Wall -g -I./ -I../cpu-opv/ -I../../../../usr/include/ -L./ -Wl,-rpath=./
+LDLIBS += -lpthread
+
+TEST_GEN_PROGS = basic_test basic_percpu_ops_test param_test \
+		librseq.so libcpu-op.so
+
+ALL: $(TEST_GEN_PROGS)
+
+librseq.so: rseq.c rseq.h rseq-*.h
+	$(CC) $(CFLAGS) -shared -fPIC $< $(LDLIBS) -o $@
+
+libcpu-op.so: ../cpu-opv/cpu-op.c ../cpu-opv/cpu-op.h
+	$(CC) $(CFLAGS) -shared -fPIC $< $(LDLIBS) -o $@
+
+# Own recipe because we only want to build against 1st prerequisite, but
+# still track changes to header files.
+%: %.c librseq.so libcpu-op.so rseq.h rseq-*.h ../cpu-opv/cpu-op.h
+	$(CC) $(CFLAGS) $< $(LDLIBS) -lrseq -lcpu-op -o $@
+
+TEST_PROGS = run_param_test.sh
+
+include ../lib.mk
diff --git a/tools/testing/selftests/rseq/basic_percpu_ops_test.c b/tools/testing/selftests/rseq/basic_percpu_ops_test.c
new file mode 100644
index 000000000000..e5f7fed06a03
--- /dev/null
+++ b/tools/testing/selftests/rseq/basic_percpu_ops_test.c
@@ -0,0 +1,333 @@
+#define _GNU_SOURCE
+#include <assert.h>
+#include <pthread.h>
+#include <sched.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stddef.h>
+
+#include "rseq.h"
+#include "cpu-op.h"
+
+#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
+
+struct percpu_lock_entry {
+	intptr_t v;
+} __attribute__((aligned(128)));
+
+struct percpu_lock {
+	struct percpu_lock_entry c[CPU_SETSIZE];
+};
+
+struct test_data_entry {
+	intptr_t count;
+} __attribute__((aligned(128)));
+
+struct spinlock_test_data {
+	struct percpu_lock lock;
+	struct test_data_entry c[CPU_SETSIZE];
+	int reps;
+};
+
+struct percpu_list_node {
+	intptr_t data;
+	struct percpu_list_node *next;
+};
+
+struct percpu_list_entry {
+	struct percpu_list_node *head;
+} __attribute__((aligned(128)));
+
+struct percpu_list {
+	struct percpu_list_entry c[CPU_SETSIZE];
+};
+
+/* A simple percpu spinlock.  Returns the cpu lock was acquired on. */
+int rseq_percpu_lock(struct percpu_lock *lock)
+{
+	int cpu;
+
+	for (;;) {
+		int ret;
+
+#ifndef SKIP_FASTPATH
+		/* Try fast path. */
+		cpu = rseq_cpu_start();
+		ret = rseq_cmpeqv_storev(&lock->c[cpu].v,
+				0, 1, cpu);
+		if (likely(!ret))
+			break;
+		if (ret > 0)
+			continue;	/* Retry. */
+#endif
+	slowpath:
+		__attribute__((unused));
+		/* Fallback on cpu_opv system call. */
+		cpu = rseq_current_cpu();
+		ret = cpu_op_cmpeqv_storev(&lock->c[cpu].v, 0, 1, cpu);
+		if (likely(!ret))
+			break;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	/*
+	 * Acquire semantic when taking lock after control dependency.
+	 * Matches rseq_smp_store_release().
+	 */
+	rseq_smp_acquire__after_ctrl_dep();
+	return cpu;
+}
+
+void rseq_percpu_unlock(struct percpu_lock *lock, int cpu)
+{
+	assert(lock->c[cpu].v == 1);
+	/*
+	 * Release lock, with release semantic. Matches
+	 * rseq_smp_acquire__after_ctrl_dep().
+	 */
+	rseq_smp_store_release(&lock->c[cpu].v, 0);
+}
+
+void *test_percpu_spinlock_thread(void *arg)
+{
+	struct spinlock_test_data *data = arg;
+	int i, cpu;
+
+	if (rseq_register_current_thread()) {
+		fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		abort();
+	}
+	for (i = 0; i < data->reps; i++) {
+		cpu = rseq_percpu_lock(&data->lock);
+		data->c[cpu].count++;
+		rseq_percpu_unlock(&data->lock, cpu);
+	}
+	if (rseq_unregister_current_thread()) {
+		fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		abort();
+	}
+
+	return NULL;
+}
+
+/*
+ * A simple test which implements a sharded counter using a per-cpu
+ * lock.  Obviously real applications might prefer to simply use a
+ * per-cpu increment; however, this is reasonable for a test and the
+ * lock can be extended to synchronize more complicated operations.
+ */
+void test_percpu_spinlock(void)
+{
+	const int num_threads = 200;
+	int i;
+	uint64_t sum;
+	pthread_t test_threads[num_threads];
+	struct spinlock_test_data data;
+
+	memset(&data, 0, sizeof(data));
+	data.reps = 5000;
+
+	for (i = 0; i < num_threads; i++)
+		pthread_create(&test_threads[i], NULL,
+			test_percpu_spinlock_thread, &data);
+
+	for (i = 0; i < num_threads; i++)
+		pthread_join(test_threads[i], NULL);
+
+	sum = 0;
+	for (i = 0; i < CPU_SETSIZE; i++)
+		sum += data.c[i].count;
+
+	assert(sum == (uint64_t)data.reps * num_threads);
+}
+
+int percpu_list_push(struct percpu_list *list, struct percpu_list_node *node)
+{
+	intptr_t *targetptr, newval, expect;
+	int cpu, ret;
+
+#ifndef SKIP_FASTPATH
+	/* Try fast path. */
+	cpu = rseq_cpu_start();
+	/* Load list->c[cpu].head with single-copy atomicity. */
+	expect = (intptr_t)READ_ONCE(list->c[cpu].head);
+	newval = (intptr_t)node;
+	targetptr = (intptr_t *)&list->c[cpu].head;
+	node->next = (struct percpu_list_node *)expect;
+	ret = rseq_cmpeqv_storev(targetptr, expect, newval, cpu);
+	if (likely(!ret))
+		return cpu;
+#endif
+	/* Fallback on cpu_opv system call. */
+	slowpath:
+		__attribute__((unused));
+	for (;;) {
+		cpu = rseq_current_cpu();
+		/* Load list->c[cpu].head with single-copy atomicity. */
+		expect = (intptr_t)READ_ONCE(list->c[cpu].head);
+		newval = (intptr_t)node;
+		targetptr = (intptr_t *)&list->c[cpu].head;
+		node->next = (struct percpu_list_node *)expect;
+		ret = cpu_op_cmpeqv_storev(targetptr, expect, newval, cpu);
+		if (likely(!ret))
+			break;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	return cpu;
+}
+
+/*
+ * Unlike a traditional lock-less linked list; the availability of a
+ * rseq primitive allows us to implement pop without concerns over
+ * ABA-type races.
+ */
+struct percpu_list_node *percpu_list_pop(struct percpu_list *list)
+{
+	struct percpu_list_node *head;
+	int cpu, ret;
+
+#ifndef SKIP_FASTPATH
+	/* Try fast path. */
+	cpu = rseq_cpu_start();
+	ret = rseq_cmpnev_storeoffp_load((intptr_t *)&list->c[cpu].head,
+		(intptr_t)NULL,
+		offsetof(struct percpu_list_node, next),
+		(intptr_t *)&head, cpu);
+	if (likely(!ret))
+		return head;
+	if (ret > 0)
+		return NULL;
+#endif
+	/* Fallback on cpu_opv system call. */
+	slowpath:
+		__attribute__((unused));
+	for (;;) {
+		cpu = rseq_current_cpu();
+		ret = cpu_op_cmpnev_storeoffp_load(
+			(intptr_t *)&list->c[cpu].head,
+			(intptr_t)NULL,
+			offsetof(struct percpu_list_node, next),
+			(intptr_t *)&head, cpu);
+		if (likely(!ret))
+			break;
+		if (ret > 0)
+			return NULL;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	return head;
+}
+
+void *test_percpu_list_thread(void *arg)
+{
+	int i;
+	struct percpu_list *list = (struct percpu_list *)arg;
+
+	if (rseq_register_current_thread()) {
+		fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		abort();
+	}
+
+	for (i = 0; i < 100000; i++) {
+		struct percpu_list_node *node = percpu_list_pop(list);
+
+		sched_yield();  /* encourage shuffling */
+		if (node)
+			percpu_list_push(list, node);
+	}
+
+	if (rseq_unregister_current_thread()) {
+		fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		abort();
+	}
+
+	return NULL;
+}
+
+/* Simultaneous modification to a per-cpu linked list from many threads.  */
+void test_percpu_list(void)
+{
+	int i, j;
+	uint64_t sum = 0, expected_sum = 0;
+	struct percpu_list list;
+	pthread_t test_threads[200];
+	cpu_set_t allowed_cpus;
+
+	memset(&list, 0, sizeof(list));
+
+	/* Generate list entries for every usable cpu. */
+	sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus);
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		if (!CPU_ISSET(i, &allowed_cpus))
+			continue;
+		for (j = 1; j <= 100; j++) {
+			struct percpu_list_node *node;
+
+			expected_sum += j;
+
+			node = malloc(sizeof(*node));
+			assert(node);
+			node->data = j;
+			node->next = list.c[i].head;
+			list.c[i].head = node;
+		}
+	}
+
+	for (i = 0; i < 200; i++)
+		assert(pthread_create(&test_threads[i], NULL,
+			test_percpu_list_thread, &list) == 0);
+
+	for (i = 0; i < 200; i++)
+		pthread_join(test_threads[i], NULL);
+
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		cpu_set_t pin_mask;
+		struct percpu_list_node *node;
+
+		if (!CPU_ISSET(i, &allowed_cpus))
+			continue;
+
+		CPU_ZERO(&pin_mask);
+		CPU_SET(i, &pin_mask);
+		sched_setaffinity(0, sizeof(pin_mask), &pin_mask);
+
+		while ((node = percpu_list_pop(&list))) {
+			sum += node->data;
+			free(node);
+		}
+	}
+
+	/*
+	 * All entries should now be accounted for (unless some external
+	 * actor is interfering with our allowed affinity while this
+	 * test is running).
+	 */
+	assert(sum == expected_sum);
+}
+
+int main(int argc, char **argv)
+{
+	if (rseq_register_current_thread()) {
+		fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		goto error;
+	}
+	printf("spinlock\n");
+	test_percpu_spinlock();
+	printf("percpu_list\n");
+	test_percpu_list();
+	if (rseq_unregister_current_thread()) {
+		fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		goto error;
+	}
+	return 0;
+
+error:
+	return -1;
+}
+
diff --git a/tools/testing/selftests/rseq/basic_test.c b/tools/testing/selftests/rseq/basic_test.c
new file mode 100644
index 000000000000..e2086b3885d7
--- /dev/null
+++ b/tools/testing/selftests/rseq/basic_test.c
@@ -0,0 +1,55 @@
+/*
+ * Basic test coverage for critical regions and rseq_current_cpu().
+ */
+
+#define _GNU_SOURCE
+#include <assert.h>
+#include <sched.h>
+#include <signal.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/time.h>
+
+#include "rseq.h"
+
+void test_cpu_pointer(void)
+{
+	cpu_set_t affinity, test_affinity;
+	int i;
+
+	sched_getaffinity(0, sizeof(affinity), &affinity);
+	CPU_ZERO(&test_affinity);
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		if (CPU_ISSET(i, &affinity)) {
+			CPU_SET(i, &test_affinity);
+			sched_setaffinity(0, sizeof(test_affinity),
+					&test_affinity);
+			assert(sched_getcpu() == i);
+			assert(rseq_current_cpu() == i);
+			assert(rseq_current_cpu_raw() == i);
+			assert(rseq_cpu_start() == i);
+			CPU_CLR(i, &test_affinity);
+		}
+	}
+	sched_setaffinity(0, sizeof(affinity), &affinity);
+}
+
+int main(int argc, char **argv)
+{
+	if (rseq_register_current_thread()) {
+		fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		goto init_thread_error;
+	}
+	printf("testing current cpu\n");
+	test_cpu_pointer();
+	if (rseq_unregister_current_thread()) {
+		fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		goto init_thread_error;
+	}
+	return 0;
+
+init_thread_error:
+	return -1;
+}
diff --git a/tools/testing/selftests/rseq/param_test.c b/tools/testing/selftests/rseq/param_test.c
new file mode 100644
index 000000000000..7b34d333d1f7
--- /dev/null
+++ b/tools/testing/selftests/rseq/param_test.c
@@ -0,0 +1,1285 @@
+#define _GNU_SOURCE
+#include <assert.h>
+#include <pthread.h>
+#include <sched.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <syscall.h>
+#include <unistd.h>
+#include <poll.h>
+#include <sys/types.h>
+#include <signal.h>
+#include <errno.h>
+#include <stddef.h>
+
+#include "cpu-op.h"
+
+static inline pid_t gettid(void)
+{
+	return syscall(__NR_gettid);
+}
+
+#define NR_INJECT	9
+static int loop_cnt[NR_INJECT + 1];
+
+static int opt_modulo, quiet;
+
+static int opt_yield, opt_signal, opt_sleep,
+		opt_disable_rseq, opt_threads = 200,
+		opt_disable_mod = 0, opt_test = 's', opt_mb = 0;
+
+static long long opt_reps = 5000;
+
+static __thread __attribute__((tls_model("initial-exec"))) unsigned int signals_delivered;
+
+#ifndef BENCHMARK
+
+static __thread __attribute__((tls_model("initial-exec"))) unsigned int yield_mod_cnt, nr_abort;
+
+#define printf_verbose(fmt, ...)			\
+	do {						\
+		if (!quiet)				\
+			printf(fmt, ## __VA_ARGS__);	\
+	} while (0)
+
+#define RSEQ_INJECT_INPUT \
+	, [loop_cnt_1]"m"(loop_cnt[1]) \
+	, [loop_cnt_2]"m"(loop_cnt[2]) \
+	, [loop_cnt_3]"m"(loop_cnt[3]) \
+	, [loop_cnt_4]"m"(loop_cnt[4]) \
+	, [loop_cnt_5]"m"(loop_cnt[5]) \
+	, [loop_cnt_6]"m"(loop_cnt[6])
+
+#if defined(__x86_64__) || defined(__i386__)
+
+#define INJECT_ASM_REG	"eax"
+
+#define RSEQ_INJECT_CLOBBER \
+	, INJECT_ASM_REG
+
+#define RSEQ_INJECT_ASM(n) \
+	"mov %[loop_cnt_" #n "], %%" INJECT_ASM_REG "\n\t" \
+	"test %%" INJECT_ASM_REG ",%%" INJECT_ASM_REG "\n\t" \
+	"jz 333f\n\t" \
+	"222:\n\t" \
+	"dec %%" INJECT_ASM_REG "\n\t" \
+	"jnz 222b\n\t" \
+	"333:\n\t"
+
+#elif defined(__ARMEL__)
+
+#define INJECT_ASM_REG	"r4"
+
+#define RSEQ_INJECT_CLOBBER \
+	, INJECT_ASM_REG
+
+#define RSEQ_INJECT_ASM(n) \
+	"ldr " INJECT_ASM_REG ", %[loop_cnt_" #n "]\n\t" \
+	"cmp " INJECT_ASM_REG ", #0\n\t" \
+	"beq 333f\n\t" \
+	"222:\n\t" \
+	"subs " INJECT_ASM_REG ", #1\n\t" \
+	"bne 222b\n\t" \
+	"333:\n\t"
+
+#elif __PPC__
+#define INJECT_ASM_REG	"r18"
+
+#define RSEQ_INJECT_CLOBBER \
+	, INJECT_ASM_REG
+
+#define RSEQ_INJECT_ASM(n) \
+	"lwz %%" INJECT_ASM_REG ", %[loop_cnt_" #n "]\n\t" \
+	"cmpwi %%" INJECT_ASM_REG ", 0\n\t" \
+	"beq 333f\n\t" \
+	"222:\n\t" \
+	"subic. %%" INJECT_ASM_REG ", %%" INJECT_ASM_REG ", 1\n\t" \
+	"bne 222b\n\t" \
+	"333:\n\t"
+#else
+#error unsupported target
+#endif
+
+#define RSEQ_INJECT_FAILED \
+	nr_abort++;
+
+#define RSEQ_INJECT_C(n) \
+{ \
+	int loc_i, loc_nr_loops = loop_cnt[n]; \
+	\
+	for (loc_i = 0; loc_i < loc_nr_loops; loc_i++) { \
+		barrier(); \
+	} \
+	if (loc_nr_loops == -1 && opt_modulo) { \
+		if (yield_mod_cnt == opt_modulo - 1) { \
+			if (opt_sleep > 0) \
+				poll(NULL, 0, opt_sleep); \
+			if (opt_yield) \
+				sched_yield(); \
+			if (opt_signal) \
+				raise(SIGUSR1); \
+			yield_mod_cnt = 0; \
+		} else { \
+			yield_mod_cnt++; \
+		} \
+	} \
+}
+
+#else
+
+#define printf_verbose(fmt, ...)
+
+#endif /* BENCHMARK */
+
+#include "rseq.h"
+
+struct percpu_lock_entry {
+	intptr_t v;
+} __attribute__((aligned(128)));
+
+struct percpu_lock {
+	struct percpu_lock_entry c[CPU_SETSIZE];
+};
+
+struct test_data_entry {
+	intptr_t count;
+} __attribute__((aligned(128)));
+
+struct spinlock_test_data {
+	struct percpu_lock lock;
+	struct test_data_entry c[CPU_SETSIZE];
+};
+
+struct spinlock_thread_test_data {
+	struct spinlock_test_data *data;
+	long long reps;
+	int reg;
+};
+
+struct inc_test_data {
+	struct test_data_entry c[CPU_SETSIZE];
+};
+
+struct inc_thread_test_data {
+	struct inc_test_data *data;
+	long long reps;
+	int reg;
+};
+
+struct percpu_list_node {
+	intptr_t data;
+	struct percpu_list_node *next;
+};
+
+struct percpu_list_entry {
+	struct percpu_list_node *head;
+} __attribute__((aligned(128)));
+
+struct percpu_list {
+	struct percpu_list_entry c[CPU_SETSIZE];
+};
+
+#define BUFFER_ITEM_PER_CPU	100
+
+struct percpu_buffer_node {
+	intptr_t data;
+};
+
+struct percpu_buffer_entry {
+	intptr_t offset;
+	intptr_t buflen;
+	struct percpu_buffer_node **array;
+} __attribute__((aligned(128)));
+
+struct percpu_buffer {
+	struct percpu_buffer_entry c[CPU_SETSIZE];
+};
+
+#define MEMCPY_BUFFER_ITEM_PER_CPU	100
+
+struct percpu_memcpy_buffer_node {
+	intptr_t data1;
+	uint64_t data2;
+};
+
+struct percpu_memcpy_buffer_entry {
+	intptr_t offset;
+	intptr_t buflen;
+	struct percpu_memcpy_buffer_node *array;
+} __attribute__((aligned(128)));
+
+struct percpu_memcpy_buffer {
+	struct percpu_memcpy_buffer_entry c[CPU_SETSIZE];
+};
+
+/* A simple percpu spinlock.  Returns the cpu lock was acquired on. */
+static int rseq_percpu_lock(struct percpu_lock *lock)
+{
+	int cpu;
+
+	for (;;) {
+		int ret;
+
+#ifndef SKIP_FASTPATH
+		/* Try fast path. */
+		cpu = rseq_cpu_start();
+		ret = rseq_cmpeqv_storev(&lock->c[cpu].v,
+				0, 1, cpu);
+		if (likely(!ret))
+			break;
+		if (ret > 0)
+			continue;	/* Retry. */
+#endif
+	slowpath:
+		__attribute__((unused));
+		/* Fallback on cpu_opv system call. */
+		cpu = rseq_current_cpu();
+		ret = cpu_op_cmpeqv_storev(&lock->c[cpu].v, 0, 1, cpu);
+		if (likely(!ret))
+			break;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	/*
+	 * Acquire semantic when taking lock after control dependency.
+	 * Matches rseq_smp_store_release().
+	 */
+	rseq_smp_acquire__after_ctrl_dep();
+	return cpu;
+}
+
+static void rseq_percpu_unlock(struct percpu_lock *lock, int cpu)
+{
+	assert(lock->c[cpu].v == 1);
+	/*
+	 * Release lock, with release semantic. Matches
+	 * rseq_smp_acquire__after_ctrl_dep().
+	 */
+	rseq_smp_store_release(&lock->c[cpu].v, 0);
+}
+
+void *test_percpu_spinlock_thread(void *arg)
+{
+	struct spinlock_thread_test_data *thread_data = arg;
+	struct spinlock_test_data *data = thread_data->data;
+	int cpu;
+	long long i, reps;
+
+	if (!opt_disable_rseq && thread_data->reg
+			&& rseq_register_current_thread())
+		abort();
+	reps = thread_data->reps;
+	for (i = 0; i < reps; i++) {
+		cpu = rseq_percpu_lock(&data->lock);
+		data->c[cpu].count++;
+		rseq_percpu_unlock(&data->lock, cpu);
+#ifndef BENCHMARK
+		if (i != 0 && !(i % (reps / 10)))
+			printf_verbose("tid %d: count %lld\n", (int) gettid(), i);
+#endif
+	}
+	printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n",
+		(int) gettid(), nr_abort, signals_delivered);
+	if (!opt_disable_rseq && thread_data->reg
+			&& rseq_unregister_current_thread())
+		abort();
+	return NULL;
+}
+
+/*
+ * A simple test which implements a sharded counter using a per-cpu
+ * lock.  Obviously real applications might prefer to simply use a
+ * per-cpu increment; however, this is reasonable for a test and the
+ * lock can be extended to synchronize more complicated operations.
+ */
+void test_percpu_spinlock(void)
+{
+	const int num_threads = opt_threads;
+	int i, ret;
+	uint64_t sum;
+	pthread_t test_threads[num_threads];
+	struct spinlock_test_data data;
+	struct spinlock_thread_test_data thread_data[num_threads];
+
+	memset(&data, 0, sizeof(data));
+	for (i = 0; i < num_threads; i++) {
+		thread_data[i].reps = opt_reps;
+		if (opt_disable_mod <= 0 || (i % opt_disable_mod))
+			thread_data[i].reg = 1;
+		else
+			thread_data[i].reg = 0;
+		thread_data[i].data = &data;
+		ret = pthread_create(&test_threads[i], NULL,
+			test_percpu_spinlock_thread, &thread_data[i]);
+		if (ret) {
+			errno = ret;
+			perror("pthread_create");
+			abort();
+		}
+	}
+
+	for (i = 0; i < num_threads; i++) {
+		pthread_join(test_threads[i], NULL);
+		if (ret) {
+			errno = ret;
+			perror("pthread_join");
+			abort();
+		}
+	}
+
+	sum = 0;
+	for (i = 0; i < CPU_SETSIZE; i++)
+		sum += data.c[i].count;
+
+	assert(sum == (uint64_t)opt_reps * num_threads);
+}
+
+void *test_percpu_inc_thread(void *arg)
+{
+	struct inc_thread_test_data *thread_data = arg;
+	struct inc_test_data *data = thread_data->data;
+	long long i, reps;
+
+	if (!opt_disable_rseq && thread_data->reg
+			&& rseq_register_current_thread())
+		abort();
+	reps = thread_data->reps;
+	for (i = 0; i < reps; i++) {
+		int cpu, ret;
+
+#ifndef SKIP_FASTPATH
+		/* Try fast path. */
+		cpu = rseq_cpu_start();
+		ret = rseq_addv(&data->c[cpu].count, 1, cpu);
+		if (likely(!ret))
+			goto next;
+#endif
+	slowpath:
+		__attribute__((unused));
+		for (;;) {
+			/* Fallback on cpu_opv system call. */
+			cpu = rseq_current_cpu();
+			ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
+			if (likely(!ret))
+				break;
+			assert(ret >= 0 || errno == EAGAIN);
+		}
+	next:
+		__attribute__((unused));
+#ifndef BENCHMARK
+		if (i != 0 && !(i % (reps / 10)))
+			printf_verbose("tid %d: count %lld\n", (int) gettid(), i);
+#endif
+	}
+	printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n",
+		(int) gettid(), nr_abort, signals_delivered);
+	if (!opt_disable_rseq && thread_data->reg
+			&& rseq_unregister_current_thread())
+		abort();
+	return NULL;
+}
+
+void test_percpu_inc(void)
+{
+	const int num_threads = opt_threads;
+	int i, ret;
+	uint64_t sum;
+	pthread_t test_threads[num_threads];
+	struct inc_test_data data;
+	struct inc_thread_test_data thread_data[num_threads];
+
+	memset(&data, 0, sizeof(data));
+	for (i = 0; i < num_threads; i++) {
+		thread_data[i].reps = opt_reps;
+		if (opt_disable_mod <= 0 || (i % opt_disable_mod))
+			thread_data[i].reg = 1;
+		else
+			thread_data[i].reg = 0;
+		thread_data[i].data = &data;
+		ret = pthread_create(&test_threads[i], NULL,
+			test_percpu_inc_thread, &thread_data[i]);
+		if (ret) {
+			errno = ret;
+			perror("pthread_create");
+			abort();
+		}
+	}
+
+	for (i = 0; i < num_threads; i++) {
+		pthread_join(test_threads[i], NULL);
+		if (ret) {
+			errno = ret;
+			perror("pthread_join");
+			abort();
+		}
+	}
+
+	sum = 0;
+	for (i = 0; i < CPU_SETSIZE; i++)
+		sum += data.c[i].count;
+
+	assert(sum == (uint64_t)opt_reps * num_threads);
+}
+
+int percpu_list_push(struct percpu_list *list, struct percpu_list_node *node)
+{
+	intptr_t *targetptr, newval, expect;
+	int cpu, ret;
+
+#ifndef SKIP_FASTPATH
+	/* Try fast path. */
+	cpu = rseq_cpu_start();
+	/* Load list->c[cpu].head with single-copy atomicity. */
+	expect = (intptr_t)READ_ONCE(list->c[cpu].head);
+	newval = (intptr_t)node;
+	targetptr = (intptr_t *)&list->c[cpu].head;
+	node->next = (struct percpu_list_node *)expect;
+	ret = rseq_cmpeqv_storev(targetptr, expect, newval, cpu);
+	if (likely(!ret))
+		return cpu;
+#endif
+	/* Fallback on cpu_opv system call. */
+slowpath:
+	__attribute__((unused));
+	for (;;) {
+		cpu = rseq_current_cpu();
+		/* Load list->c[cpu].head with single-copy atomicity. */
+		expect = (intptr_t)READ_ONCE(list->c[cpu].head);
+		newval = (intptr_t)node;
+		targetptr = (intptr_t *)&list->c[cpu].head;
+		node->next = (struct percpu_list_node *)expect;
+		ret = cpu_op_cmpeqv_storev(targetptr, expect, newval, cpu);
+		if (likely(!ret))
+			break;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	return cpu;
+}
+
+/*
+ * Unlike a traditional lock-less linked list; the availability of a
+ * rseq primitive allows us to implement pop without concerns over
+ * ABA-type races.
+ */
+struct percpu_list_node *percpu_list_pop(struct percpu_list *list)
+{
+	struct percpu_list_node *head;
+	int cpu, ret;
+
+#ifndef SKIP_FASTPATH
+	/* Try fast path. */
+	cpu = rseq_cpu_start();
+	ret = rseq_cmpnev_storeoffp_load((intptr_t *)&list->c[cpu].head,
+		(intptr_t)NULL,
+		offsetof(struct percpu_list_node, next),
+		(intptr_t *)&head, cpu);
+	if (likely(!ret))
+		return head;
+	if (ret > 0)
+		return NULL;
+#endif
+	/* Fallback on cpu_opv system call. */
+	slowpath:
+		__attribute__((unused));
+	for (;;) {
+		cpu = rseq_current_cpu();
+		ret = cpu_op_cmpnev_storeoffp_load(
+			(intptr_t *)&list->c[cpu].head,
+			(intptr_t)NULL,
+			offsetof(struct percpu_list_node, next),
+			(intptr_t *)&head, cpu);
+		if (likely(!ret))
+			break;
+		if (ret > 0)
+			return NULL;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	return head;
+}
+
+void *test_percpu_list_thread(void *arg)
+{
+	long long i, reps;
+	struct percpu_list *list = (struct percpu_list *)arg;
+
+	if (!opt_disable_rseq && rseq_register_current_thread())
+		abort();
+
+	reps = opt_reps;
+	for (i = 0; i < reps; i++) {
+		struct percpu_list_node *node = percpu_list_pop(list);
+
+		if (opt_yield)
+			sched_yield();  /* encourage shuffling */
+		if (node)
+			percpu_list_push(list, node);
+	}
+
+	printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n",
+		(int) gettid(), nr_abort, signals_delivered);
+	if (!opt_disable_rseq && rseq_unregister_current_thread())
+		abort();
+
+	return NULL;
+}
+
+/* Simultaneous modification to a per-cpu linked list from many threads.  */
+void test_percpu_list(void)
+{
+	const int num_threads = opt_threads;
+	int i, j, ret;
+	uint64_t sum = 0, expected_sum = 0;
+	struct percpu_list list;
+	pthread_t test_threads[num_threads];
+	cpu_set_t allowed_cpus;
+
+	memset(&list, 0, sizeof(list));
+
+	/* Generate list entries for every usable cpu. */
+	sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus);
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		if (!CPU_ISSET(i, &allowed_cpus))
+			continue;
+		for (j = 1; j <= 100; j++) {
+			struct percpu_list_node *node;
+
+			expected_sum += j;
+
+			node = malloc(sizeof(*node));
+			assert(node);
+			node->data = j;
+			node->next = list.c[i].head;
+			list.c[i].head = node;
+		}
+	}
+
+	for (i = 0; i < num_threads; i++) {
+		ret = pthread_create(&test_threads[i], NULL,
+			test_percpu_list_thread, &list);
+		if (ret) {
+			errno = ret;
+			perror("pthread_create");
+			abort();
+		}
+	}
+
+	for (i = 0; i < num_threads; i++) {
+		pthread_join(test_threads[i], NULL);
+		if (ret) {
+			errno = ret;
+			perror("pthread_join");
+			abort();
+		}
+	}
+
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		cpu_set_t pin_mask;
+		struct percpu_list_node *node;
+
+		if (!CPU_ISSET(i, &allowed_cpus))
+			continue;
+
+		CPU_ZERO(&pin_mask);
+		CPU_SET(i, &pin_mask);
+		sched_setaffinity(0, sizeof(pin_mask), &pin_mask);
+
+		while ((node = percpu_list_pop(&list))) {
+			sum += node->data;
+			free(node);
+		}
+	}
+
+	/*
+	 * All entries should now be accounted for (unless some external
+	 * actor is interfering with our allowed affinity while this
+	 * test is running).
+	 */
+	assert(sum == expected_sum);
+}
+
+bool percpu_buffer_push(struct percpu_buffer *buffer,
+		struct percpu_buffer_node *node)
+{
+	intptr_t *targetptr_spec, newval_spec;
+	intptr_t *targetptr_final, newval_final;
+	int cpu, ret;
+	intptr_t offset;
+
+#ifndef SKIP_FASTPATH
+	/* Try fast path. */
+	cpu = rseq_cpu_start();
+	/* Load offset with single-copy atomicity. */
+	offset = READ_ONCE(buffer->c[cpu].offset);
+	if (offset == buffer->c[cpu].buflen) {
+		if (unlikely(cpu != rseq_current_cpu_raw()))
+			goto slowpath;
+		return false;
+	}
+	newval_spec = (intptr_t)node;
+	targetptr_spec = (intptr_t *)&buffer->c[cpu].array[offset];
+	newval_final = offset + 1;
+	targetptr_final = &buffer->c[cpu].offset;
+	if (opt_mb)
+		ret = rseq_cmpeqv_trystorev_storev_release(targetptr_final,
+			offset, targetptr_spec, newval_spec,
+			newval_final, cpu);
+	else
+		ret = rseq_cmpeqv_trystorev_storev(targetptr_final,
+			offset, targetptr_spec, newval_spec,
+			newval_final, cpu);
+	if (likely(!ret))
+		return true;
+#endif
+slowpath:
+	__attribute__((unused));
+	/* Fallback on cpu_opv system call. */
+	for (;;) {
+		cpu = rseq_current_cpu();
+		/* Load offset with single-copy atomicity. */
+		offset = READ_ONCE(buffer->c[cpu].offset);
+		if (offset == buffer->c[cpu].buflen)
+			return false;
+		newval_spec = (intptr_t)node;
+		targetptr_spec = (intptr_t *)&buffer->c[cpu].array[offset];
+		newval_final = offset + 1;
+		targetptr_final = &buffer->c[cpu].offset;
+		if (opt_mb)
+			ret = cpu_op_cmpeqv_storev_mb_storev(targetptr_final,
+				offset, targetptr_spec, newval_spec,
+				newval_final, cpu);
+		else
+			ret = cpu_op_cmpeqv_storev_storev(targetptr_final,
+				offset, targetptr_spec, newval_spec,
+				newval_final, cpu);
+		if (likely(!ret))
+			break;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	return true;
+}
+
+struct percpu_buffer_node *percpu_buffer_pop(struct percpu_buffer *buffer)
+{
+	struct percpu_buffer_node *head;
+	intptr_t *targetptr, newval;
+	int cpu, ret;
+	intptr_t offset;
+
+#ifndef SKIP_FASTPATH
+	/* Try fast path. */
+	cpu = rseq_cpu_start();
+	/* Load offset with single-copy atomicity. */
+	offset = READ_ONCE(buffer->c[cpu].offset);
+	if (offset == 0) {
+		if (unlikely(cpu != rseq_current_cpu_raw()))
+			goto slowpath;
+		return NULL;
+	}
+	head = buffer->c[cpu].array[offset - 1];
+	newval = offset - 1;
+	targetptr = (intptr_t *)&buffer->c[cpu].offset;
+	ret = rseq_cmpeqv_cmpeqv_storev(targetptr, offset,
+		(intptr_t *)&buffer->c[cpu].array[offset - 1], (intptr_t)head,
+		newval, cpu);
+	if (likely(!ret))
+		return head;
+#endif
+slowpath:
+	__attribute__((unused));
+	/* Fallback on cpu_opv system call. */
+	for (;;) {
+		cpu = rseq_current_cpu();
+		/* Load offset with single-copy atomicity. */
+		offset = READ_ONCE(buffer->c[cpu].offset);
+		if (offset == 0)
+			return NULL;
+		head = buffer->c[cpu].array[offset - 1];
+		newval = offset - 1;
+		targetptr = (intptr_t *)&buffer->c[cpu].offset;
+		ret = cpu_op_cmpeqv_cmpeqv_storev(targetptr, offset,
+			(intptr_t *)&buffer->c[cpu].array[offset - 1],
+			(intptr_t)head, newval, cpu);
+		if (likely(!ret))
+			break;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	return head;
+}
+
+void *test_percpu_buffer_thread(void *arg)
+{
+	long long i, reps;
+	struct percpu_buffer *buffer = (struct percpu_buffer *)arg;
+
+	if (!opt_disable_rseq && rseq_register_current_thread())
+		abort();
+
+	reps = opt_reps;
+	for (i = 0; i < reps; i++) {
+		struct percpu_buffer_node *node = percpu_buffer_pop(buffer);
+
+		if (opt_yield)
+			sched_yield();  /* encourage shuffling */
+		if (node) {
+			if (!percpu_buffer_push(buffer, node)) {
+				/* Should increase buffer size. */
+				abort();
+			}
+		}
+	}
+
+	printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n",
+		(int) gettid(), nr_abort, signals_delivered);
+	if (!opt_disable_rseq && rseq_unregister_current_thread())
+		abort();
+
+	return NULL;
+}
+
+/* Simultaneous modification to a per-cpu buffer from many threads.  */
+void test_percpu_buffer(void)
+{
+	const int num_threads = opt_threads;
+	int i, j, ret;
+	uint64_t sum = 0, expected_sum = 0;
+	struct percpu_buffer buffer;
+	pthread_t test_threads[num_threads];
+	cpu_set_t allowed_cpus;
+
+	memset(&buffer, 0, sizeof(buffer));
+
+	/* Generate list entries for every usable cpu. */
+	sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus);
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		if (!CPU_ISSET(i, &allowed_cpus))
+			continue;
+		/* Worse-case is every item in same CPU. */
+		buffer.c[i].array =
+			malloc(sizeof(*buffer.c[i].array) * CPU_SETSIZE
+				* BUFFER_ITEM_PER_CPU);
+		assert(buffer.c[i].array);
+		buffer.c[i].buflen = CPU_SETSIZE * BUFFER_ITEM_PER_CPU;
+		for (j = 1; j <= BUFFER_ITEM_PER_CPU; j++) {
+			struct percpu_buffer_node *node;
+
+			expected_sum += j;
+
+			/*
+			 * We could theoretically put the word-sized
+			 * "data" directly in the buffer. However, we
+			 * want to model objects that would not fit
+			 * within a single word, so allocate an object
+			 * for each node.
+			 */
+			node = malloc(sizeof(*node));
+			assert(node);
+			node->data = j;
+			buffer.c[i].array[j - 1] = node;
+			buffer.c[i].offset++;
+		}
+	}
+
+	for (i = 0; i < num_threads; i++) {
+		ret = pthread_create(&test_threads[i], NULL,
+			test_percpu_buffer_thread, &buffer);
+		if (ret) {
+			errno = ret;
+			perror("pthread_create");
+			abort();
+		}
+	}
+
+	for (i = 0; i < num_threads; i++) {
+		pthread_join(test_threads[i], NULL);
+		if (ret) {
+			errno = ret;
+			perror("pthread_join");
+			abort();
+		}
+	}
+
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		cpu_set_t pin_mask;
+		struct percpu_buffer_node *node;
+
+		if (!CPU_ISSET(i, &allowed_cpus))
+			continue;
+
+		CPU_ZERO(&pin_mask);
+		CPU_SET(i, &pin_mask);
+		sched_setaffinity(0, sizeof(pin_mask), &pin_mask);
+
+		while ((node = percpu_buffer_pop(&buffer))) {
+			sum += node->data;
+			free(node);
+		}
+		free(buffer.c[i].array);
+	}
+
+	/*
+	 * All entries should now be accounted for (unless some external
+	 * actor is interfering with our allowed affinity while this
+	 * test is running).
+	 */
+	assert(sum == expected_sum);
+}
+
+bool percpu_memcpy_buffer_push(struct percpu_memcpy_buffer *buffer,
+		struct percpu_memcpy_buffer_node item)
+{
+	char *destptr, *srcptr;
+	size_t copylen;
+	intptr_t *targetptr_final, newval_final;
+	int cpu, ret;
+	intptr_t offset;
+
+#ifndef SKIP_FASTPATH
+	/* Try fast path. */
+	cpu = rseq_cpu_start();
+	/* Load offset with single-copy atomicity. */
+	offset = READ_ONCE(buffer->c[cpu].offset);
+	if (offset == buffer->c[cpu].buflen) {
+		if (unlikely(cpu != rseq_current_cpu_raw()))
+			goto slowpath;
+		return false;
+	}
+	destptr = (char *)&buffer->c[cpu].array[offset];
+	srcptr = (char *)&item;
+	copylen = sizeof(item);
+	newval_final = offset + 1;
+	targetptr_final = &buffer->c[cpu].offset;
+	if (opt_mb)
+		ret = rseq_cmpeqv_trymemcpy_storev_release(targetptr_final,
+			offset, destptr, srcptr, copylen,
+			newval_final, cpu);
+	else
+		ret = rseq_cmpeqv_trymemcpy_storev(targetptr_final,
+			offset, destptr, srcptr, copylen,
+			newval_final, cpu);
+	if (likely(!ret))
+		return true;
+#endif
+slowpath:
+	__attribute__((unused));
+	/* Fallback on cpu_opv system call. */
+	for (;;) {
+		cpu = rseq_current_cpu();
+		/* Load offset with single-copy atomicity. */
+		offset = READ_ONCE(buffer->c[cpu].offset);
+		if (offset == buffer->c[cpu].buflen)
+			return false;
+		destptr = (char *)&buffer->c[cpu].array[offset];
+		srcptr = (char *)&item;
+		copylen = sizeof(item);
+		newval_final = offset + 1;
+		targetptr_final = &buffer->c[cpu].offset;
+		/* copylen must be <= PAGE_SIZE. */
+		if (opt_mb)
+			ret = cpu_op_cmpeqv_memcpy_mb_storev(targetptr_final,
+				offset, destptr, srcptr, copylen,
+				newval_final, cpu);
+		else
+			ret = cpu_op_cmpeqv_memcpy_storev(targetptr_final,
+				offset, destptr, srcptr, copylen,
+				newval_final, cpu);
+		if (likely(!ret))
+			break;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	return true;
+}
+
+bool percpu_memcpy_buffer_pop(struct percpu_memcpy_buffer *buffer,
+		struct percpu_memcpy_buffer_node *item)
+{
+	char *destptr, *srcptr;
+	size_t copylen;
+	intptr_t *targetptr_final, newval_final;
+	int cpu, ret;
+	intptr_t offset;
+
+#ifndef SKIP_FASTPATH
+	/* Try fast path. */
+	cpu = rseq_cpu_start();
+	/* Load offset with single-copy atomicity. */
+	offset = READ_ONCE(buffer->c[cpu].offset);
+	if (offset == 0) {
+		if (unlikely(cpu != rseq_current_cpu_raw()))
+			goto slowpath;
+		return false;
+	}
+	destptr = (char *)item;
+	srcptr = (char *)&buffer->c[cpu].array[offset - 1];
+	copylen = sizeof(*item);
+	newval_final = offset - 1;
+	targetptr_final = &buffer->c[cpu].offset;
+	ret = rseq_cmpeqv_trymemcpy_storev(targetptr_final,
+		offset, destptr, srcptr, copylen,
+		newval_final, cpu);
+	if (likely(!ret))
+		return true;
+#endif
+slowpath:
+	__attribute__((unused));
+	/* Fallback on cpu_opv system call. */
+	for (;;) {
+		cpu = rseq_current_cpu();
+		/* Load offset with single-copy atomicity. */
+		offset = READ_ONCE(buffer->c[cpu].offset);
+		if (offset == 0)
+			return false;
+		destptr = (char *)item;
+		srcptr = (char *)&buffer->c[cpu].array[offset - 1];
+		copylen = sizeof(*item);
+		newval_final = offset - 1;
+		targetptr_final = &buffer->c[cpu].offset;
+		/* copylen must be <= PAGE_SIZE. */
+		ret = cpu_op_cmpeqv_memcpy_storev(targetptr_final,
+			offset, destptr, srcptr, copylen,
+			newval_final, cpu);
+		if (likely(!ret))
+			break;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	return true;
+}
+
+void *test_percpu_memcpy_buffer_thread(void *arg)
+{
+	long long i, reps;
+	struct percpu_memcpy_buffer *buffer = (struct percpu_memcpy_buffer *)arg;
+
+	if (!opt_disable_rseq && rseq_register_current_thread())
+		abort();
+
+	reps = opt_reps;
+	for (i = 0; i < reps; i++) {
+		struct percpu_memcpy_buffer_node item;
+		bool result;
+
+		result = percpu_memcpy_buffer_pop(buffer, &item);
+		if (opt_yield)
+			sched_yield();  /* encourage shuffling */
+		if (result) {
+			if (!percpu_memcpy_buffer_push(buffer, item)) {
+				/* Should increase buffer size. */
+				abort();
+			}
+		}
+	}
+
+	printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n",
+		(int) gettid(), nr_abort, signals_delivered);
+	if (!opt_disable_rseq && rseq_unregister_current_thread())
+		abort();
+
+	return NULL;
+}
+
+/* Simultaneous modification to a per-cpu buffer from many threads.  */
+void test_percpu_memcpy_buffer(void)
+{
+	const int num_threads = opt_threads;
+	int i, j, ret;
+	uint64_t sum = 0, expected_sum = 0;
+	struct percpu_memcpy_buffer buffer;
+	pthread_t test_threads[num_threads];
+	cpu_set_t allowed_cpus;
+
+	memset(&buffer, 0, sizeof(buffer));
+
+	/* Generate list entries for every usable cpu. */
+	sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus);
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		if (!CPU_ISSET(i, &allowed_cpus))
+			continue;
+		/* Worse-case is every item in same CPU. */
+		buffer.c[i].array =
+			malloc(sizeof(*buffer.c[i].array) * CPU_SETSIZE
+				* MEMCPY_BUFFER_ITEM_PER_CPU);
+		assert(buffer.c[i].array);
+		buffer.c[i].buflen = CPU_SETSIZE * MEMCPY_BUFFER_ITEM_PER_CPU;
+		for (j = 1; j <= MEMCPY_BUFFER_ITEM_PER_CPU; j++) {
+			expected_sum += 2 * j + 1;
+
+			/*
+			 * We could theoretically put the word-sized
+			 * "data" directly in the buffer. However, we
+			 * want to model objects that would not fit
+			 * within a single word, so allocate an object
+			 * for each node.
+			 */
+			buffer.c[i].array[j - 1].data1 = j;
+			buffer.c[i].array[j - 1].data2 = j + 1;
+			buffer.c[i].offset++;
+		}
+	}
+
+	for (i = 0; i < num_threads; i++) {
+		ret = pthread_create(&test_threads[i], NULL,
+			test_percpu_memcpy_buffer_thread, &buffer);
+		if (ret) {
+			errno = ret;
+			perror("pthread_create");
+			abort();
+		}
+	}
+
+	for (i = 0; i < num_threads; i++) {
+		pthread_join(test_threads[i], NULL);
+		if (ret) {
+			errno = ret;
+			perror("pthread_join");
+			abort();
+		}
+	}
+
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		cpu_set_t pin_mask;
+		struct percpu_memcpy_buffer_node item;
+
+		if (!CPU_ISSET(i, &allowed_cpus))
+			continue;
+
+		CPU_ZERO(&pin_mask);
+		CPU_SET(i, &pin_mask);
+		sched_setaffinity(0, sizeof(pin_mask), &pin_mask);
+
+		while (percpu_memcpy_buffer_pop(&buffer, &item)) {
+			sum += item.data1;
+			sum += item.data2;
+		}
+		free(buffer.c[i].array);
+	}
+
+	/*
+	 * All entries should now be accounted for (unless some external
+	 * actor is interfering with our allowed affinity while this
+	 * test is running).
+	 */
+	assert(sum == expected_sum);
+}
+
+static void test_signal_interrupt_handler(int signo)
+{
+	signals_delivered++;
+}
+
+static int set_signal_handler(void)
+{
+	int ret = 0;
+	struct sigaction sa;
+	sigset_t sigset;
+
+	ret = sigemptyset(&sigset);
+	if (ret < 0) {
+		perror("sigemptyset");
+		return ret;
+	}
+
+	sa.sa_handler = test_signal_interrupt_handler;
+	sa.sa_mask = sigset;
+	sa.sa_flags = 0;
+	ret = sigaction(SIGUSR1, &sa, NULL);
+	if (ret < 0) {
+		perror("sigaction");
+		return ret;
+	}
+
+	printf_verbose("Signal handler set for SIGUSR1\n");
+
+	return ret;
+}
+
+static void show_usage(int argc, char **argv)
+{
+	printf("Usage : %s <OPTIONS>\n",
+		argv[0]);
+	printf("OPTIONS:\n");
+	printf("	[-1 loops] Number of loops for delay injection 1\n");
+	printf("	[-2 loops] Number of loops for delay injection 2\n");
+	printf("	[-3 loops] Number of loops for delay injection 3\n");
+	printf("	[-4 loops] Number of loops for delay injection 4\n");
+	printf("	[-5 loops] Number of loops for delay injection 5\n");
+	printf("	[-6 loops] Number of loops for delay injection 6\n");
+	printf("	[-7 loops] Number of loops for delay injection 7 (-1 to enable -m)\n");
+	printf("	[-8 loops] Number of loops for delay injection 8 (-1 to enable -m)\n");
+	printf("	[-9 loops] Number of loops for delay injection 9 (-1 to enable -m)\n");
+	printf("	[-m N] Yield/sleep/kill every modulo N (default 0: disabled) (>= 0)\n");
+	printf("	[-y] Yield\n");
+	printf("	[-k] Kill thread with signal\n");
+	printf("	[-s S] S: =0: disabled (default), >0: sleep time (ms)\n");
+	printf("	[-t N] Number of threads (default 200)\n");
+	printf("	[-r N] Number of repetitions per thread (default 5000)\n");
+	printf("	[-d] Disable rseq system call (no initialization)\n");
+	printf("	[-D M] Disable rseq for each M threads\n");
+	printf("	[-T test] Choose test: (s)pinlock, (l)ist, (b)uffer, (m)emcpy, (i)ncrement\n");
+	printf("	[-M] Push into buffer and memcpy buffer with memory barriers.\n");
+	printf("	[-q] Quiet output.\n");
+	printf("	[-h] Show this help.\n");
+	printf("\n");
+}
+
+int main(int argc, char **argv)
+{
+	int i;
+
+	for (i = 1; i < argc; i++) {
+		if (argv[i][0] != '-')
+			continue;
+		switch (argv[i][1]) {
+		case '1':
+		case '2':
+		case '3':
+		case '4':
+		case '5':
+		case '6':
+		case '7':
+		case '8':
+		case '9':
+			if (argc < i + 2) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			loop_cnt[argv[i][1] - '0'] = atol(argv[i + 1]);
+			i++;
+			break;
+		case 'm':
+			if (argc < i + 2) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			opt_modulo = atol(argv[i + 1]);
+			if (opt_modulo < 0) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			i++;
+			break;
+		case 's':
+			if (argc < i + 2) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			opt_sleep = atol(argv[i + 1]);
+			if (opt_sleep < 0) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			i++;
+			break;
+		case 'y':
+			opt_yield = 1;
+			break;
+		case 'k':
+			opt_signal = 1;
+			break;
+		case 'd':
+			opt_disable_rseq = 1;
+			break;
+		case 'D':
+			if (argc < i + 2) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			opt_disable_mod = atol(argv[i + 1]);
+			if (opt_disable_mod < 0) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			i++;
+			break;
+		case 't':
+			if (argc < i + 2) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			opt_threads = atol(argv[i + 1]);
+			if (opt_threads < 0) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			i++;
+			break;
+		case 'r':
+			if (argc < i + 2) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			opt_reps = atoll(argv[i + 1]);
+			if (opt_reps < 0) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			i++;
+			break;
+		case 'h':
+			show_usage(argc, argv);
+			goto end;
+		case 'T':
+			if (argc < i + 2) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			opt_test = *argv[i + 1];
+			switch (opt_test) {
+			case 's':
+			case 'l':
+			case 'i':
+			case 'b':
+			case 'm':
+				break;
+			default:
+				show_usage(argc, argv);
+				goto error;
+			}
+			i++;
+			break;
+		case 'q':
+			quiet = 1;
+			break;
+		case 'M':
+			opt_mb = 1;
+			break;
+		default:
+			show_usage(argc, argv);
+			goto error;
+		}
+	}
+
+	if (set_signal_handler())
+		goto error;
+
+	if (!opt_disable_rseq && rseq_register_current_thread())
+		goto error;
+	switch (opt_test) {
+	case 's':
+		printf_verbose("spinlock\n");
+		test_percpu_spinlock();
+		break;
+	case 'l':
+		printf_verbose("linked list\n");
+		test_percpu_list();
+		break;
+	case 'b':
+		printf_verbose("buffer\n");
+		test_percpu_buffer();
+		break;
+	case 'm':
+		printf_verbose("memcpy buffer\n");
+		test_percpu_memcpy_buffer();
+		break;
+	case 'i':
+		printf_verbose("counter increment\n");
+		test_percpu_inc();
+		break;
+	}
+	if (!opt_disable_rseq && rseq_unregister_current_thread())
+		abort();
+end:
+	return 0;
+
+error:
+	return -1;
+}
diff --git a/tools/testing/selftests/rseq/rseq-arm.h b/tools/testing/selftests/rseq/rseq-arm.h
new file mode 100644
index 000000000000..d2e9f07d569a
--- /dev/null
+++ b/tools/testing/selftests/rseq/rseq-arm.h
@@ -0,0 +1,535 @@
+/*
+ * rseq-arm.h
+ *
+ * (C) Copyright 2016 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#define RSEQ_SIG	0x53053053
+
+#define rseq_smp_mb()	__asm__ __volatile__ ("dmb" : : : "memory")
+#define rseq_smp_rmb()	__asm__ __volatile__ ("dmb" : : : "memory")
+#define rseq_smp_wmb()	__asm__ __volatile__ ("dmb" : : : "memory")
+
+#define rseq_smp_load_acquire(p)					\
+__extension__ ({							\
+	__typeof(*p) ____p1 = RSEQ_READ_ONCE(*p);			\
+	rseq_smp_mb();							\
+	____p1;								\
+})
+
+#define rseq_smp_acquire__after_ctrl_dep()	rseq_smp_rmb()
+
+#define rseq_smp_store_release(p, v)					\
+do {									\
+	rseq_smp_mb();							\
+	WRITE_ONCE(*p, v);						\
+} while (0)
+
+#define RSEQ_ASM_DEFINE_TABLE(section, version, flags,			\
+			start_ip, post_commit_offset, abort_ip)		\
+		".pushsection " __rseq_str(section) ", \"aw\"\n\t"	\
+		".balign 32\n\t"					\
+		".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
+		".word " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \
+		".popsection\n\t"
+
+#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)		\
+		__rseq_str(label) ":\n\t"				\
+		RSEQ_INJECT_ASM(1)					\
+		"adr r0, " __rseq_str(cs_label) "\n\t"			\
+		"str r0, [%[" __rseq_str(rseq_cs) "]]\n\t"
+
+#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label)		\
+		RSEQ_INJECT_ASM(2)					\
+		"ldr r0, %[" __rseq_str(current_cpu_id) "]\n\t"	\
+		"cmp %[" __rseq_str(cpu_id) "], r0\n\t"		\
+		"bne " __rseq_str(label) "\n\t"
+
+#define RSEQ_ASM_DEFINE_ABORT(table_label, label, section, sig,		\
+			teardown, abort_label, version, flags, start_ip,\
+			post_commit_offset, abort_ip)			\
+		__rseq_str(table_label) ":\n\t" 			\
+		".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
+		".word " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \
+		".word " __rseq_str(RSEQ_SIG) "\n\t"			\
+		__rseq_str(label) ":\n\t"				\
+		teardown						\
+		"b %l[" __rseq_str(abort_label) "]\n\t"
+
+#define RSEQ_ASM_DEFINE_CMPFAIL(label, section, teardown, cmpfail_label) \
+		__rseq_str(label) ":\n\t"				\
+		teardown						\
+		"b %l[" __rseq_str(cmpfail_label) "]\n\t"
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"ldr r0, %[v]\n\t"
+		"cmp %[expect], r0\n\t"
+		"bne 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* final store */
+		"str %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(5)
+		"b 6f\n\t"
+		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG, "", abort,
+			0x0, 0x0, 1b, 2b-1b, 4f)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		"6:\n\t"
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"r"(&__rseq_abi.rseq_cs),
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "r0", "memory", "cc"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
+		off_t voffp, intptr_t *load, int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"ldr r0, %[v]\n\t"
+		"cmp %[expectnot], r0\n\t"
+		"beq 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		"str r0, %[load]\n\t"
+		"add r0, %[voffp]\n\t"
+		"ldr r0, [r0]\n\t"
+		/* final store */
+		"str r0, %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(5)
+		"b 6f\n\t"
+		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG, "", abort,
+			0x0, 0x0, 1b, 2b-1b, 4f)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		"6:\n\t"
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"r"(&__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expectnot]"r"(expectnot),
+		  [voffp]"Ir"(voffp),
+		  [load]"m"(*load)
+		  RSEQ_INJECT_INPUT
+		: "r0", "memory", "cc"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_addv(intptr_t *v, intptr_t count, int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"ldr r0, %[v]\n\t"
+		"add r0, %[count]\n\t"
+		/* final store */
+		"str r0, %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(4)
+		"b 6f\n\t"
+		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG, "", abort,
+			0x0, 0x0, 1b, 2b-1b, 4f)
+		"6:\n\t"
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"r"(&__rseq_abi.rseq_cs),
+		  [v]"m"(*v),
+		  [count]"Ir"(count)
+		  RSEQ_INJECT_INPUT
+		: "r0", "memory", "cc"
+		  RSEQ_INJECT_CLOBBER
+		: abort
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"ldr r0, %[v]\n\t"
+		"cmp %[expect], r0\n\t"
+		"bne 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try store */
+		"str %[newv2], %[v2]\n\t"
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		"str %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		"b 6f\n\t"
+		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG, "", abort,
+			0x0, 0x0, 1b, 2b-1b, 4f)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		"6:\n\t"
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"r"(&__rseq_abi.rseq_cs),
+		  /* try store input */
+		  [v2]"m"(*v2),
+		  [newv2]"r"(newv2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "r0", "memory", "cc"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"ldr r0, %[v]\n\t"
+		"cmp %[expect], r0\n\t"
+		"bne 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try store */
+		"str %[newv2], %[v2]\n\t"
+		RSEQ_INJECT_ASM(5)
+		"dmb\n\t"	/* full mb provides store-release */
+		/* final store */
+		"str %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		"b 6f\n\t"
+		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG, "", abort,
+			0x0, 0x0, 1b, 2b-1b, 4f)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		"6:\n\t"
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"r"(&__rseq_abi.rseq_cs),
+		  /* try store input */
+		  [v2]"m"(*v2),
+		  [newv2]"r"(newv2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "r0", "memory", "cc"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t expect2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"ldr r0, %[v]\n\t"
+		"cmp %[expect], r0\n\t"
+		"bne 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		"ldr r0, %[v2]\n\t"
+		"cmp %[expect2], r0\n\t"
+		"bne 5f\n\t"
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		"str %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		"b 6f\n\t"
+		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG, "", abort,
+			0x0, 0x0, 1b, 2b-1b, 4f)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		"6:\n\t"
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"r"(&__rseq_abi.rseq_cs),
+		  /* cmp2 input */
+		  [v2]"m"(*v2),
+		  [expect2]"r"(expect2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "r0", "memory", "cc"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	uint32_t rseq_scratch[3];
+
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		"str %[src], %[rseq_scratch0]\n\t"
+		"str %[dst], %[rseq_scratch1]\n\t"
+		"str %[len], %[rseq_scratch2]\n\t"
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"ldr r0, %[v]\n\t"
+		"cmp %[expect], r0\n\t"
+		"bne 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try memcpy */
+		"cmp %[len], #0\n\t" \
+		"beq 333f\n\t" \
+		"222:\n\t" \
+		"ldrb %%r0, [%[src]]\n\t" \
+		"strb %%r0, [%[dst]]\n\t" \
+		"adds %[src], #1\n\t" \
+		"adds %[dst], #1\n\t" \
+		"subs %[len], #1\n\t" \
+		"bne 222b\n\t" \
+		"333:\n\t" \
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		"str %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		/* teardown */
+		"ldr %[len], %[rseq_scratch2]\n\t"
+		"ldr %[dst], %[rseq_scratch1]\n\t"
+		"ldr %[src], %[rseq_scratch0]\n\t"
+		"b 6f\n\t"
+		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG,
+			/* teardown */
+			"ldr %[len], %[rseq_scratch2]\n\t"
+			"ldr %[dst], %[rseq_scratch1]\n\t"
+			"ldr %[src], %[rseq_scratch0]\n\t",
+			abort, 0x0, 0x0, 1b, 2b-1b, 4f)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure,
+			/* teardown */
+			"ldr %[len], %[rseq_scratch2]\n\t"
+			"ldr %[dst], %[rseq_scratch1]\n\t"
+			"ldr %[src], %[rseq_scratch0]\n\t",
+			cmpfail)
+		"6:\n\t"
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"r"(&__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv),
+		  /* try memcpy input */
+		  [dst]"r"(dst),
+		  [src]"r"(src),
+		  [len]"r"(len),
+		  [rseq_scratch0]"m"(rseq_scratch[0]),
+		  [rseq_scratch1]"m"(rseq_scratch[1]),
+		  [rseq_scratch2]"m"(rseq_scratch[2])
+		  RSEQ_INJECT_INPUT
+		: "r0", "memory", "cc"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	uint32_t rseq_scratch[3];
+
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		"str %[src], %[rseq_scratch0]\n\t"
+		"str %[dst], %[rseq_scratch1]\n\t"
+		"str %[len], %[rseq_scratch2]\n\t"
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"ldr r0, %[v]\n\t"
+		"cmp %[expect], r0\n\t"
+		"bne 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try memcpy */
+		"cmp %[len], #0\n\t" \
+		"beq 333f\n\t" \
+		"222:\n\t" \
+		"ldrb %%r0, [%[src]]\n\t" \
+		"strb %%r0, [%[dst]]\n\t" \
+		"adds %[src], #1\n\t" \
+		"adds %[dst], #1\n\t" \
+		"subs %[len], #1\n\t" \
+		"bne 222b\n\t" \
+		"333:\n\t" \
+		RSEQ_INJECT_ASM(5)
+		"dmb\n\t"	/* full mb provides store-release */
+		/* final store */
+		"str %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		/* teardown */
+		"ldr %[len], %[rseq_scratch2]\n\t"
+		"ldr %[dst], %[rseq_scratch1]\n\t"
+		"ldr %[src], %[rseq_scratch0]\n\t"
+		"b 6f\n\t"
+		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG,
+			/* teardown */
+			"ldr %[len], %[rseq_scratch2]\n\t"
+			"ldr %[dst], %[rseq_scratch1]\n\t"
+			"ldr %[src], %[rseq_scratch0]\n\t",
+			abort, 0x0, 0x0, 1b, 2b-1b, 4f)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure,
+			/* teardown */
+			"ldr %[len], %[rseq_scratch2]\n\t"
+			"ldr %[dst], %[rseq_scratch1]\n\t"
+			"ldr %[src], %[rseq_scratch0]\n\t",
+			cmpfail)
+		"6:\n\t"
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"r"(&__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv),
+		  /* try memcpy input */
+		  [dst]"r"(dst),
+		  [src]"r"(src),
+		  [len]"r"(len),
+		  [rseq_scratch0]"m"(rseq_scratch[0]),
+		  [rseq_scratch1]"m"(rseq_scratch[1]),
+		  [rseq_scratch2]"m"(rseq_scratch[2])
+		  RSEQ_INJECT_INPUT
+		: "r0", "memory", "cc"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
diff --git a/tools/testing/selftests/rseq/rseq-ppc.h b/tools/testing/selftests/rseq/rseq-ppc.h
new file mode 100644
index 000000000000..bff0d97db0ff
--- /dev/null
+++ b/tools/testing/selftests/rseq/rseq-ppc.h
@@ -0,0 +1,567 @@
+/*
+ * rseq-ppc.h
+ *
+ * (C) Copyright 2016 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ * (C) Copyright 2016 - Boqun Feng <boqun.feng@gmail.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#define RSEQ_SIG	0x53053053
+
+#define rseq_smp_mb()		__asm__ __volatile__ ("sync" : : : "memory")
+#define rseq_smp_lwsync()	__asm__ __volatile__ ("lwsync" : : : "memory")
+#define rseq_smp_rmb()		rseq_smp_lwsync()
+#define rseq_smp_wmb()		rseq_smp_lwsync()
+
+#define rseq_smp_load_acquire(p)					\
+__extension__ ({							\
+	__typeof(*p) ____p1 = RSEQ_READ_ONCE(*p);			\
+	rseq_smp_lwsync();						\
+	____p1;								\
+})
+
+#define rseq_smp_acquire__after_ctrl_dep()	rseq_smp_lwsync()
+
+#define rseq_smp_store_release(p, v)					\
+do {									\
+	rseq_smp_lwsync();						\
+	RSEQ_WRITE_ONCE(*p, v);						\
+} while (0)
+
+/*
+ * The __rseq_table section can be used by debuggers to better handle
+ * single-stepping through the restartable critical sections.
+ */
+
+#ifdef __PPC64__
+
+#define STORE_WORD	"std "
+#define LOAD_WORD	"ld "
+#define LOADX_WORD	"ldx "
+#define CMP_WORD	"cmpd "
+
+#define RSEQ_ASM_DEFINE_TABLE(label, section, version, flags,			\
+			start_ip, post_commit_offset, abort_ip)			\
+		".pushsection " __rseq_str(section) ", \"aw\"\n\t"		\
+		".balign 32\n\t"						\
+		__rseq_str(label) ":\n\t"					\
+		".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t"	\
+		".quad " __rseq_str(start_ip) ", " __rseq_str(post_commit_offset) ", " __rseq_str(abort_ip) "\n\t" \
+		".popsection\n\t"
+
+#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)			\
+		__rseq_str(label) ":\n\t"					\
+		RSEQ_INJECT_ASM(1)						\
+		"lis %%r17, (" __rseq_str(cs_label) ")@highest\n\t"		\
+		"ori %%r17, %%r17, (" __rseq_str(cs_label) ")@higher\n\t"	\
+		"rldicr %%r17, %%r17, 32, 31\n\t"				\
+		"oris %%r17, %%r17, (" __rseq_str(cs_label) ")@high\n\t"	\
+		"ori %%r17, %%r17, (" __rseq_str(cs_label) ")@l\n\t"		\
+		"std %%r17, %[" __rseq_str(rseq_cs) "]\n\t"
+
+#else /* #ifdef __PPC64__ */
+
+#define STORE_WORD	"stw "
+#define LOAD_WORD	"lwz "
+#define LOADX_WORD	"lwzx "
+#define CMP_WORD	"cmpw "
+
+#define RSEQ_ASM_DEFINE_TABLE(label, section, version, flags,			\
+			start_ip, post_commit_offset, abort_ip)			\
+		".pushsection " __rseq_str(section) ", \"aw\"\n\t"		\
+		".balign 32\n\t"						\
+		__rseq_str(label) ":\n\t"					\
+		".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t"	\
+		/* 32-bit only supported on BE */				\
+		".long 0x0, " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) "\n\t" \
+		".popsection\n\t"
+
+#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)			\
+		__rseq_str(label) ":\n\t"					\
+		RSEQ_INJECT_ASM(1)						\
+		"lis %%r17, (" __rseq_str(cs_label) ")@ha\n\t"			\
+		"addi %%r17, %%r17, (" __rseq_str(cs_label) ")@l\n\t"		\
+		"stw %%r17, %[" __rseq_str(rseq_cs) "]\n\t"
+
+#endif /* #ifdef __PPC64__ */
+
+#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label)			\
+		RSEQ_INJECT_ASM(2)						\
+		"lwz %%r17, %[" __rseq_str(current_cpu_id) "]\n\t"		\
+		"cmpw cr7, %[" __rseq_str(cpu_id) "], %%r17\n\t"		\
+		"bne- cr7, " __rseq_str(label) "\n\t"
+
+#define RSEQ_ASM_DEFINE_ABORT(label, section, sig, teardown, abort_label)	\
+		".pushsection " __rseq_str(section) ", \"ax\"\n\t"		\
+		".long " __rseq_str(sig) "\n\t"					\
+		__rseq_str(label) ":\n\t"					\
+		teardown							\
+		"b %l[" __rseq_str(abort_label) "]\n\t"			\
+		".popsection\n\t"
+
+#define RSEQ_ASM_DEFINE_CMPFAIL(label, section, teardown, cmpfail_label)	\
+		".pushsection " __rseq_str(section) ", \"ax\"\n\t"		\
+		__rseq_str(label) ":\n\t"					\
+		teardown							\
+		"b %l[" __rseq_str(cmpfail_label) "]\n\t"			\
+		".popsection\n\t"
+
+
+/*
+ * RSEQ_ASM_OPs: asm operations for rseq
+ * 	RSEQ_ASM_OP_R_*: has hard-code registers in it
+ * 	RSEQ_ASM_OP_* (else): doesn't have hard-code registers(unless cr7)
+ */
+#define RSEQ_ASM_OP_CMPEQ(var, expect, label)					\
+		LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t"			\
+		CMP_WORD "cr7, %%r17, %[" __rseq_str(expect) "]\n\t"		\
+		"bne- cr7, " __rseq_str(label) "\n\t"
+
+#define RSEQ_ASM_OP_CMPNE(var, expectnot, label)				\
+		LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t"			\
+		CMP_WORD "cr7, %%r17, %[" __rseq_str(expectnot) "]\n\t"	\
+		"beq- cr7, " __rseq_str(label) "\n\t"
+
+#define RSEQ_ASM_OP_STORE(value, var)						\
+		STORE_WORD "%[" __rseq_str(value) "], %[" __rseq_str(var) "]\n\t"
+
+/* Load @var to r17 */
+#define RSEQ_ASM_OP_R_LOAD(var)							\
+		LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t"
+
+/* Store r17 to @var */
+#define RSEQ_ASM_OP_R_STORE(var)						\
+		STORE_WORD "%%r17, %[" __rseq_str(var) "]\n\t"
+
+/* Add @count to r17 */
+#define RSEQ_ASM_OP_R_ADD(count)						\
+		"add %%r17, %[" __rseq_str(count) "], %%r17\n\t"
+
+/* Load (r17 + voffp) to r17 */
+#define RSEQ_ASM_OP_R_LOADX(voffp)						\
+		LOADX_WORD "%%r17, %[" __rseq_str(voffp) "], %%r17\n\t"
+
+/* TODO: implement a faster memcpy. */
+#define RSEQ_ASM_OP_R_MEMCPY() \
+		"cmpdi %%r19, 0\n\t" \
+		"beq 333f\n\t" \
+		"addi %%r20, %%r20, -1\n\t" \
+		"addi %%r21, %%r21, -1\n\t" \
+		"222:\n\t" \
+		"lbzu %%r18, 1(%%r20)\n\t" \
+		"stbu %%r18, 1(%%r21)\n\t" \
+		"addi %%r19, %%r19, -1\n\t" \
+		"cmpdi %%r19, 0\n\t" \
+		"bne 222b\n\t" \
+		"333:\n\t" \
+
+#define RSEQ_ASM_OP_R_FINAL_STORE(var, post_commit_label)			\
+		STORE_WORD "%%r17, %[" __rseq_str(var) "]\n\t"			\
+		__rseq_str(post_commit_label) ":\n\t"
+
+#define RSEQ_ASM_OP_FINAL_STORE(value, var, post_commit_label)			\
+		STORE_WORD "%[" __rseq_str(value) "], %[" __rseq_str(var) "]\n\t"	\
+		__rseq_str(post_commit_label) ":\n\t"
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		/* cmp cpuid */
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* cmp @v equal to @expect */
+		RSEQ_ASM_OP_CMPEQ(v, expect, 5f)
+		RSEQ_INJECT_ASM(4)
+		/* final store */
+		RSEQ_ASM_OP_FINAL_STORE(newv, v, 2)
+		RSEQ_INJECT_ASM(5)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "r17"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
+		off_t voffp, intptr_t *load, int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		/* cmp cpuid */
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* cmp @v not equal to @expectnot */
+		RSEQ_ASM_OP_CMPNE(v, expectnot, 5f)
+		RSEQ_INJECT_ASM(4)
+		/* load the value of @v */
+		RSEQ_ASM_OP_R_LOAD(v)
+		/* store it in @load */
+		RSEQ_ASM_OP_R_STORE(load)
+		/* dereference voffp(v) */
+		RSEQ_ASM_OP_R_LOADX(voffp)
+		/* final store the value at voffp(v) */
+		RSEQ_ASM_OP_R_FINAL_STORE(v, 2)
+		RSEQ_INJECT_ASM(5)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expectnot]"r"(expectnot),
+		  [voffp]"b"(voffp),
+		  [load]"m"(*load)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "r17"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_addv(intptr_t *v, intptr_t count, int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		/* cmp cpuid */
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* load the value of @v */
+		RSEQ_ASM_OP_R_LOAD(v)
+		/* add @count to it */
+		RSEQ_ASM_OP_R_ADD(count)
+		/* final store */
+		RSEQ_ASM_OP_R_FINAL_STORE(v, 2)
+		RSEQ_INJECT_ASM(4)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [count]"r"(count)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "r17"
+		  RSEQ_INJECT_CLOBBER
+		: abort
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		/* cmp cpuid */
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* cmp @v equal to @expect */
+		RSEQ_ASM_OP_CMPEQ(v, expect, 5f)
+		RSEQ_INJECT_ASM(4)
+		/* try store */
+		RSEQ_ASM_OP_STORE(newv2, v2)
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		RSEQ_ASM_OP_FINAL_STORE(newv, v, 2)
+		RSEQ_INJECT_ASM(6)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* try store input */
+		  [v2]"m"(*v2),
+		  [newv2]"r"(newv2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "r17"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		/* cmp cpuid */
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* cmp @v equal to @expect */
+		RSEQ_ASM_OP_CMPEQ(v, expect, 5f)
+		RSEQ_INJECT_ASM(4)
+		/* try store */
+		RSEQ_ASM_OP_STORE(newv2, v2)
+		RSEQ_INJECT_ASM(5)
+		/* for 'release' */
+		"lwsync\n\t"
+		/* final store */
+		RSEQ_ASM_OP_FINAL_STORE(newv, v, 2)
+		RSEQ_INJECT_ASM(6)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* try store input */
+		  [v2]"m"(*v2),
+		  [newv2]"r"(newv2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "r17"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t expect2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		/* cmp cpuid */
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* cmp @v equal to @expect */
+		RSEQ_ASM_OP_CMPEQ(v, expect, 5f)
+		RSEQ_INJECT_ASM(4)
+		/* cmp @v2 equal to @expct2 */
+		RSEQ_ASM_OP_CMPEQ(v2, expect2, 5f)
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		RSEQ_ASM_OP_FINAL_STORE(newv, v, 2)
+		RSEQ_INJECT_ASM(6)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* cmp2 input */
+		  [v2]"m"(*v2),
+		  [expect2]"r"(expect2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "r17"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		/* setup for mempcy */
+		"mr %%r19, %[len]\n\t" \
+		"mr %%r20, %[src]\n\t" \
+		"mr %%r21, %[dst]\n\t" \
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		/* cmp cpuid */
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* cmp @v equal to @expect */
+		RSEQ_ASM_OP_CMPEQ(v, expect, 5f)
+		RSEQ_INJECT_ASM(4)
+		/* try memcpy */
+		RSEQ_ASM_OP_R_MEMCPY()
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		RSEQ_ASM_OP_FINAL_STORE(newv, v, 2)
+		RSEQ_INJECT_ASM(6)
+		/* teardown */
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv),
+		  /* try memcpy input */
+		  [dst]"r"(dst),
+		  [src]"r"(src),
+		  [len]"r"(len)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "r17", "r18", "r19", "r20", "r21"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		/* setup for mempcy */
+		"mr %%r19, %[len]\n\t" \
+		"mr %%r20, %[src]\n\t" \
+		"mr %%r21, %[dst]\n\t" \
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		/* cmp cpuid */
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* cmp @v equal to @expect */
+		RSEQ_ASM_OP_CMPEQ(v, expect, 5f)
+		RSEQ_INJECT_ASM(4)
+		/* try memcpy */
+		RSEQ_ASM_OP_R_MEMCPY()
+		RSEQ_INJECT_ASM(5)
+		/* for 'release' */
+		"lwsync\n\t"
+		/* final store */
+		RSEQ_ASM_OP_FINAL_STORE(newv, v, 2)
+		RSEQ_INJECT_ASM(6)
+		/* teardown */
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv),
+		  /* try memcpy input */
+		  [dst]"r"(dst),
+		  [src]"r"(src),
+		  [len]"r"(len)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "r17", "r18", "r19", "r20", "r21"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+#undef STORE_WORD
+#undef LOAD_WORD
+#undef LOADX_WORD
+#undef CMP_WORD
diff --git a/tools/testing/selftests/rseq/rseq-x86.h b/tools/testing/selftests/rseq/rseq-x86.h
new file mode 100644
index 000000000000..7e4c21751c52
--- /dev/null
+++ b/tools/testing/selftests/rseq/rseq-x86.h
@@ -0,0 +1,898 @@
+/*
+ * rseq-x86.h
+ *
+ * (C) Copyright 2016 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <stdint.h>
+
+#define RSEQ_SIG	0x53053053
+
+#ifdef __x86_64__
+
+#define rseq_smp_mb()	__asm__ __volatile__ ("mfence" : : : "memory")
+#define rseq_smp_rmb()	barrier()
+#define rseq_smp_wmb()	barrier()
+
+#define rseq_smp_load_acquire(p)					\
+__extension__ ({							\
+	__typeof(*p) ____p1 = RSEQ_READ_ONCE(*p);			\
+	barrier();							\
+	____p1;								\
+})
+
+#define rseq_smp_acquire__after_ctrl_dep()	rseq_smp_rmb()
+
+#define rseq_smp_store_release(p, v)					\
+do {									\
+	barrier();							\
+	RSEQ_WRITE_ONCE(*p, v);						\
+} while (0)
+
+#define RSEQ_ASM_DEFINE_TABLE(label, section, version, flags,		\
+			start_ip, post_commit_offset, abort_ip)		\
+		".pushsection " __rseq_str(section) ", \"aw\"\n\t"	\
+		".balign 32\n\t"					\
+		__rseq_str(label) ":\n\t"				\
+		".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
+		".quad " __rseq_str(start_ip) ", " __rseq_str(post_commit_offset) ", " __rseq_str(abort_ip) "\n\t" \
+		".popsection\n\t"
+
+#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)		\
+		__rseq_str(label) ":\n\t"				\
+		RSEQ_INJECT_ASM(1)					\
+		"leaq " __rseq_str(cs_label) "(%%rip), %%rax\n\t"	\
+		"movq %%rax, %[" __rseq_str(rseq_cs) "]\n\t"
+
+#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label)		\
+		RSEQ_INJECT_ASM(2)					\
+		"cmpl %[" __rseq_str(cpu_id) "], %[" __rseq_str(current_cpu_id) "]\n\t" \
+		"jnz " __rseq_str(label) "\n\t"
+
+#define RSEQ_ASM_DEFINE_ABORT(label, section, sig, teardown, abort_label) \
+		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
+		/* Disassembler-friendly signature: nopl <sig>(%rip). */\
+		".byte 0x0f, 0x1f, 0x05\n\t"				\
+		".long " __rseq_str(sig) "\n\t"			\
+		__rseq_str(label) ":\n\t"				\
+		teardown						\
+		"jmp %l[" __rseq_str(abort_label) "]\n\t"		\
+		".popsection\n\t"
+
+#define RSEQ_ASM_DEFINE_CMPFAIL(label, section, teardown, cmpfail_label) \
+		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
+		__rseq_str(label) ":\n\t"				\
+		teardown						\
+		"jmp %l[" __rseq_str(cmpfail_label) "]\n\t"		\
+		".popsection\n\t"
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"cmpq %[v], %[expect]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* final store */
+		"movq %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(5)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "rax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
+		off_t voffp, intptr_t *load, int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"cmpq %[v], %[expectnot]\n\t"
+		"jz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		"movq %[v], %%rax\n\t"
+		"movq %%rax, %[load]\n\t"
+		"addq %[voffp], %%rax\n\t"
+		"movq (%%rax), %%rax\n\t"
+		/* final store */
+		"movq %%rax, %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(5)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expectnot]"r"(expectnot),
+		  [voffp]"er"(voffp),
+		  [load]"m"(*load)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "rax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_addv(intptr_t *v, intptr_t count, int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* final store */
+		"addq %[count], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(4)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [count]"er"(count)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "rax"
+		  RSEQ_INJECT_CLOBBER
+		: abort
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"cmpq %[v], %[expect]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try store */
+		"movq %[newv2], %[v2]\n\t"
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		"movq %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* try store input */
+		  [v2]"m"(*v2),
+		  [newv2]"r"(newv2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "rax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+/* x86-64 is TSO. */
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	return rseq_cmpeqv_trystorev_storev(v, expect, v2, newv2,
+			newv, cpu);
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t expect2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"cmpq %[v], %[expect]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		"cmpq %[v2], %[expect2]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		"movq %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* cmp2 input */
+		  [v2]"m"(*v2),
+		  [expect2]"r"(expect2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "rax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	uint64_t rseq_scratch[3];
+
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		"movq %[src], %[rseq_scratch0]\n\t"
+		"movq %[dst], %[rseq_scratch1]\n\t"
+		"movq %[len], %[rseq_scratch2]\n\t"
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"cmpq %[v], %[expect]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try memcpy */
+		"test %[len], %[len]\n\t" \
+		"jz 333f\n\t" \
+		"222:\n\t" \
+		"movb (%[src]), %%al\n\t" \
+		"movb %%al, (%[dst])\n\t" \
+		"inc %[src]\n\t" \
+		"inc %[dst]\n\t" \
+		"dec %[len]\n\t" \
+		"jnz 222b\n\t" \
+		"333:\n\t" \
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		"movq %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		/* teardown */
+		"movq %[rseq_scratch2], %[len]\n\t"
+		"movq %[rseq_scratch1], %[dst]\n\t"
+		"movq %[rseq_scratch0], %[src]\n\t"
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG,
+			"movq %[rseq_scratch2], %[len]\n\t"
+			"movq %[rseq_scratch1], %[dst]\n\t"
+			"movq %[rseq_scratch0], %[src]\n\t",
+			abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure,
+			"movq %[rseq_scratch2], %[len]\n\t"
+			"movq %[rseq_scratch1], %[dst]\n\t"
+			"movq %[rseq_scratch0], %[src]\n\t",
+			cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv),
+		  /* try memcpy input */
+		  [dst]"r"(dst),
+		  [src]"r"(src),
+		  [len]"r"(len),
+		  [rseq_scratch0]"m"(rseq_scratch[0]),
+		  [rseq_scratch1]"m"(rseq_scratch[1]),
+		  [rseq_scratch2]"m"(rseq_scratch[2])
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "rax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+/* x86-64 is TSO. */
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	return rseq_cmpeqv_trymemcpy_storev(v, expect, dst, src,
+			len, newv, cpu);
+}
+
+#elif __i386__
+
+/*
+ * Support older 32-bit architectures that do not implement fence
+ * instructions.
+ */
+#define rseq_smp_mb()	\
+	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
+#define rseq_smp_rmb()	\
+	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
+#define rseq_smp_wmb()	\
+	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
+
+#define rseq_smp_load_acquire(p)					\
+__extension__ ({							\
+	__typeof(*p) ____p1 = RSEQ_READ_ONCE(*p);			\
+	rseq_smp_mb();							\
+	____p1;								\
+})
+
+#define rseq_smp_acquire__after_ctrl_dep()	rseq_smp_rmb()
+
+#define rseq_smp_store_release(p, v)					\
+do {									\
+	rseq_smp_mb();							\
+	RSEQ_WRITE_ONCE(*p, v);						\
+} while (0)
+
+/*
+ * Use eax as scratch register and take memory operands as input to
+ * lessen register pressure. Especially needed when compiling in O0.
+ */
+#define RSEQ_ASM_DEFINE_TABLE(label, section, version, flags,		\
+			start_ip, post_commit_offset, abort_ip)		\
+		".pushsection " __rseq_str(section) ", \"aw\"\n\t"	\
+		".balign 32\n\t"					\
+		__rseq_str(label) ":\n\t"				\
+		".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
+		".long " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \
+		".popsection\n\t"
+
+#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)		\
+		__rseq_str(label) ":\n\t"				\
+		RSEQ_INJECT_ASM(1)					\
+		"movl $" __rseq_str(cs_label) ", %[rseq_cs]\n\t"
+
+#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label)		\
+		RSEQ_INJECT_ASM(2)					\
+		"cmpl %[" __rseq_str(cpu_id) "], %[" __rseq_str(current_cpu_id) "]\n\t" \
+		"jnz " __rseq_str(label) "\n\t"
+
+#define RSEQ_ASM_DEFINE_ABORT(label, section, sig, teardown, abort_label) \
+		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
+		/* Disassembler-friendly signature: nopl <sig>. */\
+		".byte 0x0f, 0x1f, 0x05\n\t"				\
+		".long " __rseq_str(sig) "\n\t"			\
+		__rseq_str(label) ":\n\t"				\
+		teardown						\
+		"jmp %l[" __rseq_str(abort_label) "]\n\t"		\
+		".popsection\n\t"
+
+#define RSEQ_ASM_DEFINE_CMPFAIL(label, section, teardown, cmpfail_label) \
+		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
+		__rseq_str(label) ":\n\t"				\
+		teardown						\
+		"jmp %l[" __rseq_str(cmpfail_label) "]\n\t"		\
+		".popsection\n\t"
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"cmpl %[v], %[expect]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* final store */
+		"movl %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(5)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "eax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
+		off_t voffp, intptr_t *load, int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"cmpl %[v], %[expectnot]\n\t"
+		"jz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		"movl %[v], %%eax\n\t"
+		"movl %%eax, %[load]\n\t"
+		"addl %[voffp], %%eax\n\t"
+		"movl (%%eax), %%eax\n\t"
+		/* final store */
+		"movl %%eax, %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(5)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expectnot]"r"(expectnot),
+		  [voffp]"ir"(voffp),
+		  [load]"m"(*load)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "eax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_addv(intptr_t *v, intptr_t count, int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* final store */
+		"addl %[count], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(4)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [count]"ir"(count)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "eax"
+		  RSEQ_INJECT_CLOBBER
+		: abort
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"cmpl %[v], %[expect]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try store */
+		"movl %[newv2], %%eax\n\t"
+		"movl %%eax, %[v2]\n\t"
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		"movl %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* try store input */
+		  [v2]"m"(*v2),
+		  [newv2]"m"(newv2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "eax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"movl %[expect], %%eax\n\t"
+		"cmpl %[v], %%eax\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try store */
+		"movl %[newv2], %[v2]\n\t"
+		RSEQ_INJECT_ASM(5)
+		"lock; addl $0,0(%%esp)\n\t"
+		/* final store */
+		"movl %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* try store input */
+		  [v2]"m"(*v2),
+		  [newv2]"r"(newv2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"m"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "eax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t expect2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"cmpl %[v], %[expect]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		"cmpl %[expect2], %[v2]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(5)
+		"movl %[newv], %%eax\n\t"
+		/* final store */
+		"movl %%eax, %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* cmp2 input */
+		  [v2]"m"(*v2),
+		  [expect2]"r"(expect2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"m"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "eax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+/* TODO: implement a faster memcpy. */
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	uint32_t rseq_scratch[3];
+
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		"movl %[src], %[rseq_scratch0]\n\t"
+		"movl %[dst], %[rseq_scratch1]\n\t"
+		"movl %[len], %[rseq_scratch2]\n\t"
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"movl %[expect], %%eax\n\t"
+		"cmpl %%eax, %[v]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try memcpy */
+		"test %[len], %[len]\n\t" \
+		"jz 333f\n\t" \
+		"222:\n\t" \
+		"movb (%[src]), %%al\n\t" \
+		"movb %%al, (%[dst])\n\t" \
+		"inc %[src]\n\t" \
+		"inc %[dst]\n\t" \
+		"dec %[len]\n\t" \
+		"jnz 222b\n\t" \
+		"333:\n\t" \
+		RSEQ_INJECT_ASM(5)
+		"movl %[newv], %%eax\n\t"
+		/* final store */
+		"movl %%eax, %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		/* teardown */
+		"movl %[rseq_scratch2], %[len]\n\t"
+		"movl %[rseq_scratch1], %[dst]\n\t"
+		"movl %[rseq_scratch0], %[src]\n\t"
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG,
+			"movl %[rseq_scratch2], %[len]\n\t"
+			"movl %[rseq_scratch1], %[dst]\n\t"
+			"movl %[rseq_scratch0], %[src]\n\t",
+			abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure,
+			"movl %[rseq_scratch2], %[len]\n\t"
+			"movl %[rseq_scratch1], %[dst]\n\t"
+			"movl %[rseq_scratch0], %[src]\n\t",
+			cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"m"(expect),
+		  [newv]"m"(newv),
+		  /* try memcpy input */
+		  [dst]"r"(dst),
+		  [src]"r"(src),
+		  [len]"r"(len),
+		  [rseq_scratch0]"m"(rseq_scratch[0]),
+		  [rseq_scratch1]"m"(rseq_scratch[1]),
+		  [rseq_scratch2]"m"(rseq_scratch[2])
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "eax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+/* TODO: implement a faster memcpy. */
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	uint32_t rseq_scratch[3];
+
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		"movl %[src], %[rseq_scratch0]\n\t"
+		"movl %[dst], %[rseq_scratch1]\n\t"
+		"movl %[len], %[rseq_scratch2]\n\t"
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"movl %[expect], %%eax\n\t"
+		"cmpl %%eax, %[v]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try memcpy */
+		"test %[len], %[len]\n\t" \
+		"jz 333f\n\t" \
+		"222:\n\t" \
+		"movb (%[src]), %%al\n\t" \
+		"movb %%al, (%[dst])\n\t" \
+		"inc %[src]\n\t" \
+		"inc %[dst]\n\t" \
+		"dec %[len]\n\t" \
+		"jnz 222b\n\t" \
+		"333:\n\t" \
+		RSEQ_INJECT_ASM(5)
+		"lock; addl $0,0(%%esp)\n\t"
+		"movl %[newv], %%eax\n\t"
+		/* final store */
+		"movl %%eax, %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		/* teardown */
+		"movl %[rseq_scratch2], %[len]\n\t"
+		"movl %[rseq_scratch1], %[dst]\n\t"
+		"movl %[rseq_scratch0], %[src]\n\t"
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG,
+			"movl %[rseq_scratch2], %[len]\n\t"
+			"movl %[rseq_scratch1], %[dst]\n\t"
+			"movl %[rseq_scratch0], %[src]\n\t",
+			abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure,
+			"movl %[rseq_scratch2], %[len]\n\t"
+			"movl %[rseq_scratch1], %[dst]\n\t"
+			"movl %[rseq_scratch0], %[src]\n\t",
+			cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"m"(expect),
+		  [newv]"m"(newv),
+		  /* try memcpy input */
+		  [dst]"r"(dst),
+		  [src]"r"(src),
+		  [len]"r"(len),
+		  [rseq_scratch0]"m"(rseq_scratch[0]),
+		  [rseq_scratch1]"m"(rseq_scratch[1]),
+		  [rseq_scratch2]"m"(rseq_scratch[2])
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "eax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+#endif
diff --git a/tools/testing/selftests/rseq/rseq.c b/tools/testing/selftests/rseq/rseq.c
new file mode 100644
index 000000000000..3db193c0afb0
--- /dev/null
+++ b/tools/testing/selftests/rseq/rseq.c
@@ -0,0 +1,116 @@
+/*
+ * rseq.c
+ *
+ * Copyright (C) 2016 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; only
+ * version 2.1 of the License.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <sched.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <syscall.h>
+#include <assert.h>
+#include <signal.h>
+
+#include "rseq.h"
+
+#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
+
+__attribute__((tls_model("initial-exec"))) __thread
+volatile struct rseq __rseq_abi = {
+	.cpu_id = -1,
+};
+
+static __attribute__((tls_model("initial-exec"))) __thread
+volatile int refcount;
+
+static void signal_off_save(sigset_t *oldset)
+{
+	sigset_t set;
+	int ret;
+
+	sigfillset(&set);
+	ret = pthread_sigmask(SIG_BLOCK, &set, oldset);
+	if (ret)
+		abort();
+}
+
+static void signal_restore(sigset_t oldset)
+{
+	int ret;
+
+	ret = pthread_sigmask(SIG_SETMASK, &oldset, NULL);
+	if (ret)
+		abort();
+}
+
+static int sys_rseq(volatile struct rseq *rseq_abi, uint32_t rseq_len,
+		int flags, uint32_t sig)
+{
+	return syscall(__NR_rseq, rseq_abi, rseq_len, flags, sig);
+}
+
+int rseq_register_current_thread(void)
+{
+	int rc, ret = 0;
+	sigset_t oldset;
+
+	signal_off_save(&oldset);
+	if (refcount++)
+		goto end;
+	rc = sys_rseq(&__rseq_abi, sizeof(struct rseq), 0, RSEQ_SIG);
+	if (!rc) {
+		assert(rseq_current_cpu_raw() >= 0);
+		goto end;
+	}
+	if (errno != EBUSY)
+		__rseq_abi.cpu_id = -2;
+	ret = -1;
+	refcount--;
+end:
+	signal_restore(oldset);
+	return ret;
+}
+
+int rseq_unregister_current_thread(void)
+{
+	int rc, ret = 0;
+	sigset_t oldset;
+
+	signal_off_save(&oldset);
+	if (--refcount)
+		goto end;
+	rc = sys_rseq(&__rseq_abi, sizeof(struct rseq),
+			RSEQ_FLAG_UNREGISTER, RSEQ_SIG);
+	if (!rc)
+		goto end;
+	ret = -1;
+end:
+	signal_restore(oldset);
+	return ret;
+}
+
+int32_t rseq_fallback_current_cpu(void)
+{
+	int32_t cpu;
+
+	cpu = sched_getcpu();
+	if (cpu < 0) {
+		perror("sched_getcpu()");
+		abort();
+	}
+	return cpu;
+}
diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h
new file mode 100644
index 000000000000..26c8ea01e940
--- /dev/null
+++ b/tools/testing/selftests/rseq/rseq.h
@@ -0,0 +1,154 @@
+/*
+ * rseq.h
+ *
+ * (C) Copyright 2016 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef RSEQ_H
+#define RSEQ_H
+
+#include <stdint.h>
+#include <stdbool.h>
+#include <pthread.h>
+#include <signal.h>
+#include <sched.h>
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sched.h>
+#include <linux/rseq.h>
+
+/*
+ * Empty code injection macros, override when testing.
+ * It is important to consider that the ASM injection macros need to be
+ * fully reentrant (e.g. do not modify the stack).
+ */
+#ifndef RSEQ_INJECT_ASM
+#define RSEQ_INJECT_ASM(n)
+#endif
+
+#ifndef RSEQ_INJECT_C
+#define RSEQ_INJECT_C(n)
+#endif
+
+#ifndef RSEQ_INJECT_INPUT
+#define RSEQ_INJECT_INPUT
+#endif
+
+#ifndef RSEQ_INJECT_CLOBBER
+#define RSEQ_INJECT_CLOBBER
+#endif
+
+#ifndef RSEQ_INJECT_FAILED
+#define RSEQ_INJECT_FAILED
+#endif
+
+extern __thread volatile struct rseq __rseq_abi;
+
+#define rseq_likely(x)		__builtin_expect(!!(x), 1)
+#define rseq_unlikely(x)	__builtin_expect(!!(x), 0)
+#define rseq_barrier()		__asm__ __volatile__("" : : : "memory")
+
+#define RSEQ_ACCESS_ONCE(x)	(*(__volatile__  __typeof__(x) *)&(x))
+#define RSEQ_WRITE_ONCE(x, v)	__extension__ ({ RSEQ_ACCESS_ONCE(x) = (v); })
+#define RSEQ_READ_ONCE(x)	RSEQ_ACCESS_ONCE(x)
+
+#define __rseq_str_1(x)	#x
+#define __rseq_str(x)		__rseq_str_1(x)
+
+#if defined(__x86_64__) || defined(__i386__)
+#include <rseq-x86.h>
+#elif defined(__ARMEL__)
+#include <rseq-arm.h>
+#elif defined(__PPC__)
+#include <rseq-ppc.h>
+#else
+#error unsupported target
+#endif
+
+/*
+ * Register rseq for the current thread. This needs to be called once
+ * by any thread which uses restartable sequences, before they start
+ * using restartable sequences, to ensure restartable sequences
+ * succeed. A restartable sequence executed from a non-registered
+ * thread will always fail.
+ */
+int rseq_register_current_thread(void);
+
+/*
+ * Unregister rseq for current thread.
+ */
+int rseq_unregister_current_thread(void);
+
+/*
+ * Restartable sequence fallback for reading the current CPU number.
+ */
+int32_t rseq_fallback_current_cpu(void);
+
+/*
+ * Values returned can be either the current CPU number, -1 (rseq is
+ * uninitialized), or -2 (rseq initialization has failed).
+ */
+static inline int32_t rseq_current_cpu_raw(void)
+{
+	return RSEQ_ACCESS_ONCE(__rseq_abi.cpu_id);
+}
+
+/*
+ * Returns a possible CPU number, which is typically the current CPU.
+ * The returned CPU number can be used to prepare for an rseq critical
+ * section, which will confirm whether the cpu number is indeed the
+ * current one, and whether rseq is initialized.
+ *
+ * The CPU number returned by rseq_cpu_start should always be validated
+ * by passing it to a rseq asm sequence, or by comparing it to the
+ * return value of rseq_current_cpu_raw() if the rseq asm sequence
+ * does not need to be invoked.
+ */
+static inline uint32_t rseq_cpu_start(void)
+{
+	return RSEQ_ACCESS_ONCE(__rseq_abi.cpu_id_start);
+}
+
+static inline uint32_t rseq_current_cpu(void)
+{
+	int32_t cpu;
+
+	cpu = rseq_current_cpu_raw();
+	if (rseq_unlikely(cpu < 0))
+		cpu = rseq_fallback_current_cpu();
+	return cpu;
+}
+
+/*
+ * rseq_prepare_unload() should be invoked by each thread using rseq_finish*()
+ * at least once between their last rseq_finish*() and library unload of the
+ * library defining the rseq critical section (struct rseq_cs). This also
+ * applies to use of rseq in code generated by JIT: rseq_prepare_unload()
+ * should be invoked at least once by each thread using rseq_finish*() before
+ * reclaim of the memory holding the struct rseq_cs.
+ */
+static inline void rseq_prepare_unload(void)
+{
+	__rseq_abi.rseq_cs = 0;
+}
+
+#endif  /* RSEQ_H_ */
diff --git a/tools/testing/selftests/rseq/run_param_test.sh b/tools/testing/selftests/rseq/run_param_test.sh
new file mode 100755
index 000000000000..c7475a2bef11
--- /dev/null
+++ b/tools/testing/selftests/rseq/run_param_test.sh
@@ -0,0 +1,124 @@
+#!/bin/bash
+
+EXTRA_ARGS=${@}
+
+OLDIFS="$IFS"
+IFS=$'\n'
+TEST_LIST=(
+	"-T s"
+	"-T l"
+	"-T b"
+	"-T b -M"
+	"-T m"
+	"-T m -M"
+	"-T i"
+)
+
+TEST_NAME=(
+	"spinlock"
+	"list"
+	"buffer"
+	"buffer with barrier"
+	"memcpy"
+	"memcpy with barrier"
+	"increment"
+)
+IFS="$OLDIFS"
+
+function do_tests()
+{
+	local i=0
+	while [ "$i" -lt "${#TEST_LIST[@]}" ]; do
+		echo "Running test ${TEST_NAME[$i]}"
+		./param_test ${TEST_LIST[$i]} ${@} ${EXTRA_ARGS} || exit 1
+		let "i++"
+	done
+}
+
+echo "Default parameters"
+do_tests
+
+echo "Loop injection: 10000 loops"
+
+OLDIFS="$IFS"
+IFS=$'\n'
+INJECT_LIST=(
+	"1"
+	"2"
+	"3"
+	"4"
+	"5"
+	"6"
+	"7"
+	"8"
+	"9"
+)
+IFS="$OLDIFS"
+
+NR_LOOPS=10000
+
+i=0
+while [ "$i" -lt "${#INJECT_LIST[@]}" ]; do
+	echo "Injecting at <${INJECT_LIST[$i]}>"
+	do_tests -${INJECT_LIST[i]} ${NR_LOOPS}
+	let "i++"
+done
+NR_LOOPS=
+
+function inject_blocking()
+{
+	OLDIFS="$IFS"
+	IFS=$'\n'
+	INJECT_LIST=(
+		"7"
+		"8"
+		"9"
+	)
+	IFS="$OLDIFS"
+
+	NR_LOOPS=-1
+
+	i=0
+	while [ "$i" -lt "${#INJECT_LIST[@]}" ]; do
+		echo "Injecting at <${INJECT_LIST[$i]}>"
+		do_tests -${INJECT_LIST[i]} -1 ${@}
+		let "i++"
+	done
+	NR_LOOPS=
+}
+
+echo "Yield injection (25%)"
+inject_blocking -m 4 -y -r 100
+
+echo "Yield injection (50%)"
+inject_blocking -m 2 -y -r 100
+
+echo "Yield injection (100%)"
+inject_blocking -m 1 -y -r 100
+
+echo "Kill injection (25%)"
+inject_blocking -m 4 -k -r 100
+
+echo "Kill injection (50%)"
+inject_blocking -m 2 -k -r 100
+
+echo "Kill injection (100%)"
+inject_blocking -m 1 -k -r 100
+
+echo "Sleep injection (1ms, 25%)"
+inject_blocking -m 4 -s 1 -r 100
+
+echo "Sleep injection (1ms, 50%)"
+inject_blocking -m 2 -s 1 -r 100
+
+echo "Sleep injection (1ms, 100%)"
+inject_blocking -m 1 -s 1 -r 100
+
+echo "Disable rseq for 25% threads"
+do_tests -D 4
+
+echo "Disable rseq for 50% threads"
+do_tests -D 2
+
+echo "Disable rseq"
+do_tests -d
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH v2 for 4.15 13/14] Restartable sequences: Provide self-tests
@ 2017-11-06 20:56   ` Mathieu Desnoyers
  0 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2017-11-06 20:56 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers, Shuah Khan,
	linux-kselftest-u79uwXL29TY76Z2rM5mHXA

Implements two basic tests of RSEQ functionality, and one more
exhaustive parameterizable test.

The first, "basic_test" only asserts that RSEQ works moderately
correctly. E.g. that the CPUID pointer works.

"basic_percpu_ops_test" is a slightly more "realistic" variant,
implementing a few simple per-cpu operations and testing their
correctness.

"param_test" is a parametrizable restartable sequences test. See
the "--help" output for usage.

A run_param_test.sh script runs many variants of the parametrizable
tests.

As part of those tests, a helper library "rseq" implements a user-space
API around restartable sequences. It uses the cpu_opv system call as
fallback when single-stepped by a debugger. It exposes the instruction
pointer addresses where the rseq assembly blocks begin and end, as well
as the associated abort instruction pointer, in the __rseq_table
section. This section allows debuggers may know where to place
breakpoints when single-stepping through assembly blocks which may be
aborted at any point by the kernel.

The rseq library expose APIs that present the fast-path operations.
The new from userspace is, e.g. for a counter increment:

    cpu = rseq_cpu_start();
    ret = rseq_addv(&data->c[cpu].count, 1, cpu);
    if (likely(!ret))
        return 0;        /* Success. */
    do {
        cpu = rseq_current_cpu();
        ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
        if (likely(!ret))
            return 0;    /* Success. */
    } while (ret > 0 || errno == EAGAIN);
    perror("cpu_op_addv");
    return -1;           /* Unexpected error. */

PowerPC tests have been implemented by Boqun Feng.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
CC: Russell King <linux-lFZ/pmaqli7XmaaqVzeoHQ@public.gmane.org>
CC: Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>
CC: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
CC: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
CC: Paul Turner <pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
CC: Andrew Hunter <ahh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
CC: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
CC: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
CC: Andi Kleen <andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org>
CC: Dave Watson <davejwatson-b10kYP2dOMg@public.gmane.org>
CC: Chris Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
CC: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
CC: "H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
CC: Ben Maurer <bmaurer-b10kYP2dOMg@public.gmane.org>
CC: Steven Rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org>
CC: "Paul E. McKenney" <paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
CC: Josh Triplett <josh-iaAMLnmF4UmaiuxdJuQwMA@public.gmane.org>
CC: Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
CC: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
CC: Boqun Feng <boqun.feng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
CC: Shuah Khan <shuah-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
CC: linux-kselftest-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
CC: linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
---

Changes since v1:
- Provide abort-ip signature: The abort-ip signature is located just
  before the abort-ip target. It is currently hardcoded, but a
  user-space application could use the __rseq_table to iterate on all
  abort-ip targets and use a random value as signature if needed in the
  future.
- Add rseq_prepare_unload(): Libraries and JIT code using rseq critical
  sections need to issue rseq_prepare_unload() on each thread at least
  once before reclaim of struct rseq_cs.
- Use initial-exec TLS model, non-weak symbol: The initial-exec model is
  signal-safe, whereas the global-dynamic model is not.  Remove the
  "weak" symbol attribute from the __rseq_abi in rseq.c. The rseq.so
  library will have ownership of that symbol, and there is not reason for
  an application or user library to try to define that symbol.
  The expected use is to link against libreq.so, which owns and provide
  that symbol.
- Set cpu_id to -2 on register error
- Add rseq_len syscall parameter, rseq_cs version
- Ensure disassember-friendly signature: x86 32/64 disassembler have a
  hard time decoding the instruction stream after a bad instruction. Use
  a nopl instruction to encode the signature. Suggested by Andy Lutomirski.
- Exercise parametrized tests variants in a shell scripts.
- Restartable sequences selftests: Remove use of event counter.
- Use cpu_id_start field:  With the cpu_id_start field, the C
  preparation phase of the fast-path does not need to compare cpu_id < 0
  anymore.
- Signal-safe registration and refcounting: Allow libraries using
  librseq.so to register it from signal handlers.
---
 MAINTAINERS                                        |    1 +
 tools/testing/selftests/Makefile                   |    1 +
 .../testing/selftests/cpu-opv/basic_cpu_opv_test.c |   13 +-
 tools/testing/selftests/rseq/.gitignore            |    4 +
 tools/testing/selftests/rseq/Makefile              |   22 +
 .../testing/selftests/rseq/basic_percpu_ops_test.c |  333 +++++
 tools/testing/selftests/rseq/basic_test.c          |   55 +
 tools/testing/selftests/rseq/param_test.c          | 1285 ++++++++++++++++++++
 tools/testing/selftests/rseq/rseq-arm.h            |  535 ++++++++
 tools/testing/selftests/rseq/rseq-ppc.h            |  567 +++++++++
 tools/testing/selftests/rseq/rseq-x86.h            |  898 ++++++++++++++
 tools/testing/selftests/rseq/rseq.c                |  116 ++
 tools/testing/selftests/rseq/rseq.h                |  154 +++
 tools/testing/selftests/rseq/run_param_test.sh     |  124 ++
 14 files changed, 4103 insertions(+), 5 deletions(-)
 create mode 100644 tools/testing/selftests/rseq/.gitignore
 create mode 100644 tools/testing/selftests/rseq/Makefile
 create mode 100644 tools/testing/selftests/rseq/basic_percpu_ops_test.c
 create mode 100644 tools/testing/selftests/rseq/basic_test.c
 create mode 100644 tools/testing/selftests/rseq/param_test.c
 create mode 100644 tools/testing/selftests/rseq/rseq-arm.h
 create mode 100644 tools/testing/selftests/rseq/rseq-ppc.h
 create mode 100644 tools/testing/selftests/rseq/rseq-x86.h
 create mode 100644 tools/testing/selftests/rseq/rseq.c
 create mode 100644 tools/testing/selftests/rseq/rseq.h
 create mode 100755 tools/testing/selftests/rseq/run_param_test.sh

diff --git a/MAINTAINERS b/MAINTAINERS
index 54e11f0569e0..1022b5f51cd1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11502,6 +11502,7 @@ S:	Supported
 F:	kernel/rseq.c
 F:	include/uapi/linux/rseq.h
 F:	include/trace/events/rseq.h
+F:	tools/testing/selftests/rseq/
 
 RFKILL
 M:	Johannes Berg <johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org>
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index c66e5e67cfab..b7fcd7bcb87e 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -25,6 +25,7 @@ TARGETS += nsfs
 TARGETS += powerpc
 TARGETS += pstore
 TARGETS += ptrace
+TARGETS += rseq
 TARGETS += seccomp
 TARGETS += sigaltstack
 TARGETS += size
diff --git a/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c b/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
index 23072dcf5612..6b624f1939ea 100644
--- a/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
+++ b/tools/testing/selftests/cpu-opv/basic_cpu_opv_test.c
@@ -19,6 +19,8 @@
 #define TESTBUFLEN	4096
 #define TESTBUFLEN_CMP	16
 
+#define TESTBUFLEN_PAGE_MAX	65536
+
 static int test_compare_eq_op(char *a, char *b, size_t len)
 {
 	struct cpu_op opvec[] = {
@@ -1047,20 +1049,21 @@ static int test_too_many_ops(void)
 	return 0;
 }
 
+/* Use 64kB len, largest page size known on Linux. */
 static int test_memcpy_single_too_large(void)
 {
 	int i, ret;
-	char buf1[TESTBUFLEN + 1];
-	char buf2[TESTBUFLEN + 1];
+	char buf1[TESTBUFLEN_PAGE_MAX + 1];
+	char buf2[TESTBUFLEN_PAGE_MAX + 1];
 	const char *test_name = "test_memcpy_single_too_large";
 
 	printf("Testing %s\n", test_name);
 
 	/* Test memcpy */
-	for (i = 0; i < TESTBUFLEN + 1; i++)
+	for (i = 0; i < TESTBUFLEN_PAGE_MAX + 1; i++)
 		buf1[i] = (char)i;
-	memset(buf2, 0, TESTBUFLEN + 1);
-	ret = test_memcpy_op(buf2, buf1, TESTBUFLEN + 1);
+	memset(buf2, 0, TESTBUFLEN_PAGE_MAX + 1);
+	ret = test_memcpy_op(buf2, buf1, TESTBUFLEN_PAGE_MAX + 1);
 	if (!ret || (ret < 0 && errno != EINVAL)) {
 		printf("%s returned with %d, errno: %s\n",
 			test_name, ret, strerror(errno));
diff --git a/tools/testing/selftests/rseq/.gitignore b/tools/testing/selftests/rseq/.gitignore
new file mode 100644
index 000000000000..9409c3db99b2
--- /dev/null
+++ b/tools/testing/selftests/rseq/.gitignore
@@ -0,0 +1,4 @@
+basic_percpu_ops_test
+basic_test
+basic_rseq_op_test
+param_test
diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
new file mode 100644
index 000000000000..e9b0562dd450
--- /dev/null
+++ b/tools/testing/selftests/rseq/Makefile
@@ -0,0 +1,22 @@
+CFLAGS += -O2 -Wall -g -I./ -I../cpu-opv/ -I../../../../usr/include/ -L./ -Wl,-rpath=./
+LDLIBS += -lpthread
+
+TEST_GEN_PROGS = basic_test basic_percpu_ops_test param_test \
+		librseq.so libcpu-op.so
+
+ALL: $(TEST_GEN_PROGS)
+
+librseq.so: rseq.c rseq.h rseq-*.h
+	$(CC) $(CFLAGS) -shared -fPIC $< $(LDLIBS) -o $@
+
+libcpu-op.so: ../cpu-opv/cpu-op.c ../cpu-opv/cpu-op.h
+	$(CC) $(CFLAGS) -shared -fPIC $< $(LDLIBS) -o $@
+
+# Own recipe because we only want to build against 1st prerequisite, but
+# still track changes to header files.
+%: %.c librseq.so libcpu-op.so rseq.h rseq-*.h ../cpu-opv/cpu-op.h
+	$(CC) $(CFLAGS) $< $(LDLIBS) -lrseq -lcpu-op -o $@
+
+TEST_PROGS = run_param_test.sh
+
+include ../lib.mk
diff --git a/tools/testing/selftests/rseq/basic_percpu_ops_test.c b/tools/testing/selftests/rseq/basic_percpu_ops_test.c
new file mode 100644
index 000000000000..e5f7fed06a03
--- /dev/null
+++ b/tools/testing/selftests/rseq/basic_percpu_ops_test.c
@@ -0,0 +1,333 @@
+#define _GNU_SOURCE
+#include <assert.h>
+#include <pthread.h>
+#include <sched.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stddef.h>
+
+#include "rseq.h"
+#include "cpu-op.h"
+
+#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
+
+struct percpu_lock_entry {
+	intptr_t v;
+} __attribute__((aligned(128)));
+
+struct percpu_lock {
+	struct percpu_lock_entry c[CPU_SETSIZE];
+};
+
+struct test_data_entry {
+	intptr_t count;
+} __attribute__((aligned(128)));
+
+struct spinlock_test_data {
+	struct percpu_lock lock;
+	struct test_data_entry c[CPU_SETSIZE];
+	int reps;
+};
+
+struct percpu_list_node {
+	intptr_t data;
+	struct percpu_list_node *next;
+};
+
+struct percpu_list_entry {
+	struct percpu_list_node *head;
+} __attribute__((aligned(128)));
+
+struct percpu_list {
+	struct percpu_list_entry c[CPU_SETSIZE];
+};
+
+/* A simple percpu spinlock.  Returns the cpu lock was acquired on. */
+int rseq_percpu_lock(struct percpu_lock *lock)
+{
+	int cpu;
+
+	for (;;) {
+		int ret;
+
+#ifndef SKIP_FASTPATH
+		/* Try fast path. */
+		cpu = rseq_cpu_start();
+		ret = rseq_cmpeqv_storev(&lock->c[cpu].v,
+				0, 1, cpu);
+		if (likely(!ret))
+			break;
+		if (ret > 0)
+			continue;	/* Retry. */
+#endif
+	slowpath:
+		__attribute__((unused));
+		/* Fallback on cpu_opv system call. */
+		cpu = rseq_current_cpu();
+		ret = cpu_op_cmpeqv_storev(&lock->c[cpu].v, 0, 1, cpu);
+		if (likely(!ret))
+			break;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	/*
+	 * Acquire semantic when taking lock after control dependency.
+	 * Matches rseq_smp_store_release().
+	 */
+	rseq_smp_acquire__after_ctrl_dep();
+	return cpu;
+}
+
+void rseq_percpu_unlock(struct percpu_lock *lock, int cpu)
+{
+	assert(lock->c[cpu].v == 1);
+	/*
+	 * Release lock, with release semantic. Matches
+	 * rseq_smp_acquire__after_ctrl_dep().
+	 */
+	rseq_smp_store_release(&lock->c[cpu].v, 0);
+}
+
+void *test_percpu_spinlock_thread(void *arg)
+{
+	struct spinlock_test_data *data = arg;
+	int i, cpu;
+
+	if (rseq_register_current_thread()) {
+		fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		abort();
+	}
+	for (i = 0; i < data->reps; i++) {
+		cpu = rseq_percpu_lock(&data->lock);
+		data->c[cpu].count++;
+		rseq_percpu_unlock(&data->lock, cpu);
+	}
+	if (rseq_unregister_current_thread()) {
+		fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		abort();
+	}
+
+	return NULL;
+}
+
+/*
+ * A simple test which implements a sharded counter using a per-cpu
+ * lock.  Obviously real applications might prefer to simply use a
+ * per-cpu increment; however, this is reasonable for a test and the
+ * lock can be extended to synchronize more complicated operations.
+ */
+void test_percpu_spinlock(void)
+{
+	const int num_threads = 200;
+	int i;
+	uint64_t sum;
+	pthread_t test_threads[num_threads];
+	struct spinlock_test_data data;
+
+	memset(&data, 0, sizeof(data));
+	data.reps = 5000;
+
+	for (i = 0; i < num_threads; i++)
+		pthread_create(&test_threads[i], NULL,
+			test_percpu_spinlock_thread, &data);
+
+	for (i = 0; i < num_threads; i++)
+		pthread_join(test_threads[i], NULL);
+
+	sum = 0;
+	for (i = 0; i < CPU_SETSIZE; i++)
+		sum += data.c[i].count;
+
+	assert(sum == (uint64_t)data.reps * num_threads);
+}
+
+int percpu_list_push(struct percpu_list *list, struct percpu_list_node *node)
+{
+	intptr_t *targetptr, newval, expect;
+	int cpu, ret;
+
+#ifndef SKIP_FASTPATH
+	/* Try fast path. */
+	cpu = rseq_cpu_start();
+	/* Load list->c[cpu].head with single-copy atomicity. */
+	expect = (intptr_t)READ_ONCE(list->c[cpu].head);
+	newval = (intptr_t)node;
+	targetptr = (intptr_t *)&list->c[cpu].head;
+	node->next = (struct percpu_list_node *)expect;
+	ret = rseq_cmpeqv_storev(targetptr, expect, newval, cpu);
+	if (likely(!ret))
+		return cpu;
+#endif
+	/* Fallback on cpu_opv system call. */
+	slowpath:
+		__attribute__((unused));
+	for (;;) {
+		cpu = rseq_current_cpu();
+		/* Load list->c[cpu].head with single-copy atomicity. */
+		expect = (intptr_t)READ_ONCE(list->c[cpu].head);
+		newval = (intptr_t)node;
+		targetptr = (intptr_t *)&list->c[cpu].head;
+		node->next = (struct percpu_list_node *)expect;
+		ret = cpu_op_cmpeqv_storev(targetptr, expect, newval, cpu);
+		if (likely(!ret))
+			break;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	return cpu;
+}
+
+/*
+ * Unlike a traditional lock-less linked list; the availability of a
+ * rseq primitive allows us to implement pop without concerns over
+ * ABA-type races.
+ */
+struct percpu_list_node *percpu_list_pop(struct percpu_list *list)
+{
+	struct percpu_list_node *head;
+	int cpu, ret;
+
+#ifndef SKIP_FASTPATH
+	/* Try fast path. */
+	cpu = rseq_cpu_start();
+	ret = rseq_cmpnev_storeoffp_load((intptr_t *)&list->c[cpu].head,
+		(intptr_t)NULL,
+		offsetof(struct percpu_list_node, next),
+		(intptr_t *)&head, cpu);
+	if (likely(!ret))
+		return head;
+	if (ret > 0)
+		return NULL;
+#endif
+	/* Fallback on cpu_opv system call. */
+	slowpath:
+		__attribute__((unused));
+	for (;;) {
+		cpu = rseq_current_cpu();
+		ret = cpu_op_cmpnev_storeoffp_load(
+			(intptr_t *)&list->c[cpu].head,
+			(intptr_t)NULL,
+			offsetof(struct percpu_list_node, next),
+			(intptr_t *)&head, cpu);
+		if (likely(!ret))
+			break;
+		if (ret > 0)
+			return NULL;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	return head;
+}
+
+void *test_percpu_list_thread(void *arg)
+{
+	int i;
+	struct percpu_list *list = (struct percpu_list *)arg;
+
+	if (rseq_register_current_thread()) {
+		fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		abort();
+	}
+
+	for (i = 0; i < 100000; i++) {
+		struct percpu_list_node *node = percpu_list_pop(list);
+
+		sched_yield();  /* encourage shuffling */
+		if (node)
+			percpu_list_push(list, node);
+	}
+
+	if (rseq_unregister_current_thread()) {
+		fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		abort();
+	}
+
+	return NULL;
+}
+
+/* Simultaneous modification to a per-cpu linked list from many threads.  */
+void test_percpu_list(void)
+{
+	int i, j;
+	uint64_t sum = 0, expected_sum = 0;
+	struct percpu_list list;
+	pthread_t test_threads[200];
+	cpu_set_t allowed_cpus;
+
+	memset(&list, 0, sizeof(list));
+
+	/* Generate list entries for every usable cpu. */
+	sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus);
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		if (!CPU_ISSET(i, &allowed_cpus))
+			continue;
+		for (j = 1; j <= 100; j++) {
+			struct percpu_list_node *node;
+
+			expected_sum += j;
+
+			node = malloc(sizeof(*node));
+			assert(node);
+			node->data = j;
+			node->next = list.c[i].head;
+			list.c[i].head = node;
+		}
+	}
+
+	for (i = 0; i < 200; i++)
+		assert(pthread_create(&test_threads[i], NULL,
+			test_percpu_list_thread, &list) == 0);
+
+	for (i = 0; i < 200; i++)
+		pthread_join(test_threads[i], NULL);
+
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		cpu_set_t pin_mask;
+		struct percpu_list_node *node;
+
+		if (!CPU_ISSET(i, &allowed_cpus))
+			continue;
+
+		CPU_ZERO(&pin_mask);
+		CPU_SET(i, &pin_mask);
+		sched_setaffinity(0, sizeof(pin_mask), &pin_mask);
+
+		while ((node = percpu_list_pop(&list))) {
+			sum += node->data;
+			free(node);
+		}
+	}
+
+	/*
+	 * All entries should now be accounted for (unless some external
+	 * actor is interfering with our allowed affinity while this
+	 * test is running).
+	 */
+	assert(sum == expected_sum);
+}
+
+int main(int argc, char **argv)
+{
+	if (rseq_register_current_thread()) {
+		fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		goto error;
+	}
+	printf("spinlock\n");
+	test_percpu_spinlock();
+	printf("percpu_list\n");
+	test_percpu_list();
+	if (rseq_unregister_current_thread()) {
+		fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		goto error;
+	}
+	return 0;
+
+error:
+	return -1;
+}
+
diff --git a/tools/testing/selftests/rseq/basic_test.c b/tools/testing/selftests/rseq/basic_test.c
new file mode 100644
index 000000000000..e2086b3885d7
--- /dev/null
+++ b/tools/testing/selftests/rseq/basic_test.c
@@ -0,0 +1,55 @@
+/*
+ * Basic test coverage for critical regions and rseq_current_cpu().
+ */
+
+#define _GNU_SOURCE
+#include <assert.h>
+#include <sched.h>
+#include <signal.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/time.h>
+
+#include "rseq.h"
+
+void test_cpu_pointer(void)
+{
+	cpu_set_t affinity, test_affinity;
+	int i;
+
+	sched_getaffinity(0, sizeof(affinity), &affinity);
+	CPU_ZERO(&test_affinity);
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		if (CPU_ISSET(i, &affinity)) {
+			CPU_SET(i, &test_affinity);
+			sched_setaffinity(0, sizeof(test_affinity),
+					&test_affinity);
+			assert(sched_getcpu() == i);
+			assert(rseq_current_cpu() == i);
+			assert(rseq_current_cpu_raw() == i);
+			assert(rseq_cpu_start() == i);
+			CPU_CLR(i, &test_affinity);
+		}
+	}
+	sched_setaffinity(0, sizeof(affinity), &affinity);
+}
+
+int main(int argc, char **argv)
+{
+	if (rseq_register_current_thread()) {
+		fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		goto init_thread_error;
+	}
+	printf("testing current cpu\n");
+	test_cpu_pointer();
+	if (rseq_unregister_current_thread()) {
+		fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		goto init_thread_error;
+	}
+	return 0;
+
+init_thread_error:
+	return -1;
+}
diff --git a/tools/testing/selftests/rseq/param_test.c b/tools/testing/selftests/rseq/param_test.c
new file mode 100644
index 000000000000..7b34d333d1f7
--- /dev/null
+++ b/tools/testing/selftests/rseq/param_test.c
@@ -0,0 +1,1285 @@
+#define _GNU_SOURCE
+#include <assert.h>
+#include <pthread.h>
+#include <sched.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <syscall.h>
+#include <unistd.h>
+#include <poll.h>
+#include <sys/types.h>
+#include <signal.h>
+#include <errno.h>
+#include <stddef.h>
+
+#include "cpu-op.h"
+
+static inline pid_t gettid(void)
+{
+	return syscall(__NR_gettid);
+}
+
+#define NR_INJECT	9
+static int loop_cnt[NR_INJECT + 1];
+
+static int opt_modulo, quiet;
+
+static int opt_yield, opt_signal, opt_sleep,
+		opt_disable_rseq, opt_threads = 200,
+		opt_disable_mod = 0, opt_test = 's', opt_mb = 0;
+
+static long long opt_reps = 5000;
+
+static __thread __attribute__((tls_model("initial-exec"))) unsigned int signals_delivered;
+
+#ifndef BENCHMARK
+
+static __thread __attribute__((tls_model("initial-exec"))) unsigned int yield_mod_cnt, nr_abort;
+
+#define printf_verbose(fmt, ...)			\
+	do {						\
+		if (!quiet)				\
+			printf(fmt, ## __VA_ARGS__);	\
+	} while (0)
+
+#define RSEQ_INJECT_INPUT \
+	, [loop_cnt_1]"m"(loop_cnt[1]) \
+	, [loop_cnt_2]"m"(loop_cnt[2]) \
+	, [loop_cnt_3]"m"(loop_cnt[3]) \
+	, [loop_cnt_4]"m"(loop_cnt[4]) \
+	, [loop_cnt_5]"m"(loop_cnt[5]) \
+	, [loop_cnt_6]"m"(loop_cnt[6])
+
+#if defined(__x86_64__) || defined(__i386__)
+
+#define INJECT_ASM_REG	"eax"
+
+#define RSEQ_INJECT_CLOBBER \
+	, INJECT_ASM_REG
+
+#define RSEQ_INJECT_ASM(n) \
+	"mov %[loop_cnt_" #n "], %%" INJECT_ASM_REG "\n\t" \
+	"test %%" INJECT_ASM_REG ",%%" INJECT_ASM_REG "\n\t" \
+	"jz 333f\n\t" \
+	"222:\n\t" \
+	"dec %%" INJECT_ASM_REG "\n\t" \
+	"jnz 222b\n\t" \
+	"333:\n\t"
+
+#elif defined(__ARMEL__)
+
+#define INJECT_ASM_REG	"r4"
+
+#define RSEQ_INJECT_CLOBBER \
+	, INJECT_ASM_REG
+
+#define RSEQ_INJECT_ASM(n) \
+	"ldr " INJECT_ASM_REG ", %[loop_cnt_" #n "]\n\t" \
+	"cmp " INJECT_ASM_REG ", #0\n\t" \
+	"beq 333f\n\t" \
+	"222:\n\t" \
+	"subs " INJECT_ASM_REG ", #1\n\t" \
+	"bne 222b\n\t" \
+	"333:\n\t"
+
+#elif __PPC__
+#define INJECT_ASM_REG	"r18"
+
+#define RSEQ_INJECT_CLOBBER \
+	, INJECT_ASM_REG
+
+#define RSEQ_INJECT_ASM(n) \
+	"lwz %%" INJECT_ASM_REG ", %[loop_cnt_" #n "]\n\t" \
+	"cmpwi %%" INJECT_ASM_REG ", 0\n\t" \
+	"beq 333f\n\t" \
+	"222:\n\t" \
+	"subic. %%" INJECT_ASM_REG ", %%" INJECT_ASM_REG ", 1\n\t" \
+	"bne 222b\n\t" \
+	"333:\n\t"
+#else
+#error unsupported target
+#endif
+
+#define RSEQ_INJECT_FAILED \
+	nr_abort++;
+
+#define RSEQ_INJECT_C(n) \
+{ \
+	int loc_i, loc_nr_loops = loop_cnt[n]; \
+	\
+	for (loc_i = 0; loc_i < loc_nr_loops; loc_i++) { \
+		barrier(); \
+	} \
+	if (loc_nr_loops == -1 && opt_modulo) { \
+		if (yield_mod_cnt == opt_modulo - 1) { \
+			if (opt_sleep > 0) \
+				poll(NULL, 0, opt_sleep); \
+			if (opt_yield) \
+				sched_yield(); \
+			if (opt_signal) \
+				raise(SIGUSR1); \
+			yield_mod_cnt = 0; \
+		} else { \
+			yield_mod_cnt++; \
+		} \
+	} \
+}
+
+#else
+
+#define printf_verbose(fmt, ...)
+
+#endif /* BENCHMARK */
+
+#include "rseq.h"
+
+struct percpu_lock_entry {
+	intptr_t v;
+} __attribute__((aligned(128)));
+
+struct percpu_lock {
+	struct percpu_lock_entry c[CPU_SETSIZE];
+};
+
+struct test_data_entry {
+	intptr_t count;
+} __attribute__((aligned(128)));
+
+struct spinlock_test_data {
+	struct percpu_lock lock;
+	struct test_data_entry c[CPU_SETSIZE];
+};
+
+struct spinlock_thread_test_data {
+	struct spinlock_test_data *data;
+	long long reps;
+	int reg;
+};
+
+struct inc_test_data {
+	struct test_data_entry c[CPU_SETSIZE];
+};
+
+struct inc_thread_test_data {
+	struct inc_test_data *data;
+	long long reps;
+	int reg;
+};
+
+struct percpu_list_node {
+	intptr_t data;
+	struct percpu_list_node *next;
+};
+
+struct percpu_list_entry {
+	struct percpu_list_node *head;
+} __attribute__((aligned(128)));
+
+struct percpu_list {
+	struct percpu_list_entry c[CPU_SETSIZE];
+};
+
+#define BUFFER_ITEM_PER_CPU	100
+
+struct percpu_buffer_node {
+	intptr_t data;
+};
+
+struct percpu_buffer_entry {
+	intptr_t offset;
+	intptr_t buflen;
+	struct percpu_buffer_node **array;
+} __attribute__((aligned(128)));
+
+struct percpu_buffer {
+	struct percpu_buffer_entry c[CPU_SETSIZE];
+};
+
+#define MEMCPY_BUFFER_ITEM_PER_CPU	100
+
+struct percpu_memcpy_buffer_node {
+	intptr_t data1;
+	uint64_t data2;
+};
+
+struct percpu_memcpy_buffer_entry {
+	intptr_t offset;
+	intptr_t buflen;
+	struct percpu_memcpy_buffer_node *array;
+} __attribute__((aligned(128)));
+
+struct percpu_memcpy_buffer {
+	struct percpu_memcpy_buffer_entry c[CPU_SETSIZE];
+};
+
+/* A simple percpu spinlock.  Returns the cpu lock was acquired on. */
+static int rseq_percpu_lock(struct percpu_lock *lock)
+{
+	int cpu;
+
+	for (;;) {
+		int ret;
+
+#ifndef SKIP_FASTPATH
+		/* Try fast path. */
+		cpu = rseq_cpu_start();
+		ret = rseq_cmpeqv_storev(&lock->c[cpu].v,
+				0, 1, cpu);
+		if (likely(!ret))
+			break;
+		if (ret > 0)
+			continue;	/* Retry. */
+#endif
+	slowpath:
+		__attribute__((unused));
+		/* Fallback on cpu_opv system call. */
+		cpu = rseq_current_cpu();
+		ret = cpu_op_cmpeqv_storev(&lock->c[cpu].v, 0, 1, cpu);
+		if (likely(!ret))
+			break;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	/*
+	 * Acquire semantic when taking lock after control dependency.
+	 * Matches rseq_smp_store_release().
+	 */
+	rseq_smp_acquire__after_ctrl_dep();
+	return cpu;
+}
+
+static void rseq_percpu_unlock(struct percpu_lock *lock, int cpu)
+{
+	assert(lock->c[cpu].v == 1);
+	/*
+	 * Release lock, with release semantic. Matches
+	 * rseq_smp_acquire__after_ctrl_dep().
+	 */
+	rseq_smp_store_release(&lock->c[cpu].v, 0);
+}
+
+void *test_percpu_spinlock_thread(void *arg)
+{
+	struct spinlock_thread_test_data *thread_data = arg;
+	struct spinlock_test_data *data = thread_data->data;
+	int cpu;
+	long long i, reps;
+
+	if (!opt_disable_rseq && thread_data->reg
+			&& rseq_register_current_thread())
+		abort();
+	reps = thread_data->reps;
+	for (i = 0; i < reps; i++) {
+		cpu = rseq_percpu_lock(&data->lock);
+		data->c[cpu].count++;
+		rseq_percpu_unlock(&data->lock, cpu);
+#ifndef BENCHMARK
+		if (i != 0 && !(i % (reps / 10)))
+			printf_verbose("tid %d: count %lld\n", (int) gettid(), i);
+#endif
+	}
+	printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n",
+		(int) gettid(), nr_abort, signals_delivered);
+	if (!opt_disable_rseq && thread_data->reg
+			&& rseq_unregister_current_thread())
+		abort();
+	return NULL;
+}
+
+/*
+ * A simple test which implements a sharded counter using a per-cpu
+ * lock.  Obviously real applications might prefer to simply use a
+ * per-cpu increment; however, this is reasonable for a test and the
+ * lock can be extended to synchronize more complicated operations.
+ */
+void test_percpu_spinlock(void)
+{
+	const int num_threads = opt_threads;
+	int i, ret;
+	uint64_t sum;
+	pthread_t test_threads[num_threads];
+	struct spinlock_test_data data;
+	struct spinlock_thread_test_data thread_data[num_threads];
+
+	memset(&data, 0, sizeof(data));
+	for (i = 0; i < num_threads; i++) {
+		thread_data[i].reps = opt_reps;
+		if (opt_disable_mod <= 0 || (i % opt_disable_mod))
+			thread_data[i].reg = 1;
+		else
+			thread_data[i].reg = 0;
+		thread_data[i].data = &data;
+		ret = pthread_create(&test_threads[i], NULL,
+			test_percpu_spinlock_thread, &thread_data[i]);
+		if (ret) {
+			errno = ret;
+			perror("pthread_create");
+			abort();
+		}
+	}
+
+	for (i = 0; i < num_threads; i++) {
+		pthread_join(test_threads[i], NULL);
+		if (ret) {
+			errno = ret;
+			perror("pthread_join");
+			abort();
+		}
+	}
+
+	sum = 0;
+	for (i = 0; i < CPU_SETSIZE; i++)
+		sum += data.c[i].count;
+
+	assert(sum == (uint64_t)opt_reps * num_threads);
+}
+
+void *test_percpu_inc_thread(void *arg)
+{
+	struct inc_thread_test_data *thread_data = arg;
+	struct inc_test_data *data = thread_data->data;
+	long long i, reps;
+
+	if (!opt_disable_rseq && thread_data->reg
+			&& rseq_register_current_thread())
+		abort();
+	reps = thread_data->reps;
+	for (i = 0; i < reps; i++) {
+		int cpu, ret;
+
+#ifndef SKIP_FASTPATH
+		/* Try fast path. */
+		cpu = rseq_cpu_start();
+		ret = rseq_addv(&data->c[cpu].count, 1, cpu);
+		if (likely(!ret))
+			goto next;
+#endif
+	slowpath:
+		__attribute__((unused));
+		for (;;) {
+			/* Fallback on cpu_opv system call. */
+			cpu = rseq_current_cpu();
+			ret = cpu_op_addv(&data->c[cpu].count, 1, cpu);
+			if (likely(!ret))
+				break;
+			assert(ret >= 0 || errno == EAGAIN);
+		}
+	next:
+		__attribute__((unused));
+#ifndef BENCHMARK
+		if (i != 0 && !(i % (reps / 10)))
+			printf_verbose("tid %d: count %lld\n", (int) gettid(), i);
+#endif
+	}
+	printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n",
+		(int) gettid(), nr_abort, signals_delivered);
+	if (!opt_disable_rseq && thread_data->reg
+			&& rseq_unregister_current_thread())
+		abort();
+	return NULL;
+}
+
+void test_percpu_inc(void)
+{
+	const int num_threads = opt_threads;
+	int i, ret;
+	uint64_t sum;
+	pthread_t test_threads[num_threads];
+	struct inc_test_data data;
+	struct inc_thread_test_data thread_data[num_threads];
+
+	memset(&data, 0, sizeof(data));
+	for (i = 0; i < num_threads; i++) {
+		thread_data[i].reps = opt_reps;
+		if (opt_disable_mod <= 0 || (i % opt_disable_mod))
+			thread_data[i].reg = 1;
+		else
+			thread_data[i].reg = 0;
+		thread_data[i].data = &data;
+		ret = pthread_create(&test_threads[i], NULL,
+			test_percpu_inc_thread, &thread_data[i]);
+		if (ret) {
+			errno = ret;
+			perror("pthread_create");
+			abort();
+		}
+	}
+
+	for (i = 0; i < num_threads; i++) {
+		pthread_join(test_threads[i], NULL);
+		if (ret) {
+			errno = ret;
+			perror("pthread_join");
+			abort();
+		}
+	}
+
+	sum = 0;
+	for (i = 0; i < CPU_SETSIZE; i++)
+		sum += data.c[i].count;
+
+	assert(sum == (uint64_t)opt_reps * num_threads);
+}
+
+int percpu_list_push(struct percpu_list *list, struct percpu_list_node *node)
+{
+	intptr_t *targetptr, newval, expect;
+	int cpu, ret;
+
+#ifndef SKIP_FASTPATH
+	/* Try fast path. */
+	cpu = rseq_cpu_start();
+	/* Load list->c[cpu].head with single-copy atomicity. */
+	expect = (intptr_t)READ_ONCE(list->c[cpu].head);
+	newval = (intptr_t)node;
+	targetptr = (intptr_t *)&list->c[cpu].head;
+	node->next = (struct percpu_list_node *)expect;
+	ret = rseq_cmpeqv_storev(targetptr, expect, newval, cpu);
+	if (likely(!ret))
+		return cpu;
+#endif
+	/* Fallback on cpu_opv system call. */
+slowpath:
+	__attribute__((unused));
+	for (;;) {
+		cpu = rseq_current_cpu();
+		/* Load list->c[cpu].head with single-copy atomicity. */
+		expect = (intptr_t)READ_ONCE(list->c[cpu].head);
+		newval = (intptr_t)node;
+		targetptr = (intptr_t *)&list->c[cpu].head;
+		node->next = (struct percpu_list_node *)expect;
+		ret = cpu_op_cmpeqv_storev(targetptr, expect, newval, cpu);
+		if (likely(!ret))
+			break;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	return cpu;
+}
+
+/*
+ * Unlike a traditional lock-less linked list; the availability of a
+ * rseq primitive allows us to implement pop without concerns over
+ * ABA-type races.
+ */
+struct percpu_list_node *percpu_list_pop(struct percpu_list *list)
+{
+	struct percpu_list_node *head;
+	int cpu, ret;
+
+#ifndef SKIP_FASTPATH
+	/* Try fast path. */
+	cpu = rseq_cpu_start();
+	ret = rseq_cmpnev_storeoffp_load((intptr_t *)&list->c[cpu].head,
+		(intptr_t)NULL,
+		offsetof(struct percpu_list_node, next),
+		(intptr_t *)&head, cpu);
+	if (likely(!ret))
+		return head;
+	if (ret > 0)
+		return NULL;
+#endif
+	/* Fallback on cpu_opv system call. */
+	slowpath:
+		__attribute__((unused));
+	for (;;) {
+		cpu = rseq_current_cpu();
+		ret = cpu_op_cmpnev_storeoffp_load(
+			(intptr_t *)&list->c[cpu].head,
+			(intptr_t)NULL,
+			offsetof(struct percpu_list_node, next),
+			(intptr_t *)&head, cpu);
+		if (likely(!ret))
+			break;
+		if (ret > 0)
+			return NULL;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	return head;
+}
+
+void *test_percpu_list_thread(void *arg)
+{
+	long long i, reps;
+	struct percpu_list *list = (struct percpu_list *)arg;
+
+	if (!opt_disable_rseq && rseq_register_current_thread())
+		abort();
+
+	reps = opt_reps;
+	for (i = 0; i < reps; i++) {
+		struct percpu_list_node *node = percpu_list_pop(list);
+
+		if (opt_yield)
+			sched_yield();  /* encourage shuffling */
+		if (node)
+			percpu_list_push(list, node);
+	}
+
+	printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n",
+		(int) gettid(), nr_abort, signals_delivered);
+	if (!opt_disable_rseq && rseq_unregister_current_thread())
+		abort();
+
+	return NULL;
+}
+
+/* Simultaneous modification to a per-cpu linked list from many threads.  */
+void test_percpu_list(void)
+{
+	const int num_threads = opt_threads;
+	int i, j, ret;
+	uint64_t sum = 0, expected_sum = 0;
+	struct percpu_list list;
+	pthread_t test_threads[num_threads];
+	cpu_set_t allowed_cpus;
+
+	memset(&list, 0, sizeof(list));
+
+	/* Generate list entries for every usable cpu. */
+	sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus);
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		if (!CPU_ISSET(i, &allowed_cpus))
+			continue;
+		for (j = 1; j <= 100; j++) {
+			struct percpu_list_node *node;
+
+			expected_sum += j;
+
+			node = malloc(sizeof(*node));
+			assert(node);
+			node->data = j;
+			node->next = list.c[i].head;
+			list.c[i].head = node;
+		}
+	}
+
+	for (i = 0; i < num_threads; i++) {
+		ret = pthread_create(&test_threads[i], NULL,
+			test_percpu_list_thread, &list);
+		if (ret) {
+			errno = ret;
+			perror("pthread_create");
+			abort();
+		}
+	}
+
+	for (i = 0; i < num_threads; i++) {
+		pthread_join(test_threads[i], NULL);
+		if (ret) {
+			errno = ret;
+			perror("pthread_join");
+			abort();
+		}
+	}
+
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		cpu_set_t pin_mask;
+		struct percpu_list_node *node;
+
+		if (!CPU_ISSET(i, &allowed_cpus))
+			continue;
+
+		CPU_ZERO(&pin_mask);
+		CPU_SET(i, &pin_mask);
+		sched_setaffinity(0, sizeof(pin_mask), &pin_mask);
+
+		while ((node = percpu_list_pop(&list))) {
+			sum += node->data;
+			free(node);
+		}
+	}
+
+	/*
+	 * All entries should now be accounted for (unless some external
+	 * actor is interfering with our allowed affinity while this
+	 * test is running).
+	 */
+	assert(sum == expected_sum);
+}
+
+bool percpu_buffer_push(struct percpu_buffer *buffer,
+		struct percpu_buffer_node *node)
+{
+	intptr_t *targetptr_spec, newval_spec;
+	intptr_t *targetptr_final, newval_final;
+	int cpu, ret;
+	intptr_t offset;
+
+#ifndef SKIP_FASTPATH
+	/* Try fast path. */
+	cpu = rseq_cpu_start();
+	/* Load offset with single-copy atomicity. */
+	offset = READ_ONCE(buffer->c[cpu].offset);
+	if (offset == buffer->c[cpu].buflen) {
+		if (unlikely(cpu != rseq_current_cpu_raw()))
+			goto slowpath;
+		return false;
+	}
+	newval_spec = (intptr_t)node;
+	targetptr_spec = (intptr_t *)&buffer->c[cpu].array[offset];
+	newval_final = offset + 1;
+	targetptr_final = &buffer->c[cpu].offset;
+	if (opt_mb)
+		ret = rseq_cmpeqv_trystorev_storev_release(targetptr_final,
+			offset, targetptr_spec, newval_spec,
+			newval_final, cpu);
+	else
+		ret = rseq_cmpeqv_trystorev_storev(targetptr_final,
+			offset, targetptr_spec, newval_spec,
+			newval_final, cpu);
+	if (likely(!ret))
+		return true;
+#endif
+slowpath:
+	__attribute__((unused));
+	/* Fallback on cpu_opv system call. */
+	for (;;) {
+		cpu = rseq_current_cpu();
+		/* Load offset with single-copy atomicity. */
+		offset = READ_ONCE(buffer->c[cpu].offset);
+		if (offset == buffer->c[cpu].buflen)
+			return false;
+		newval_spec = (intptr_t)node;
+		targetptr_spec = (intptr_t *)&buffer->c[cpu].array[offset];
+		newval_final = offset + 1;
+		targetptr_final = &buffer->c[cpu].offset;
+		if (opt_mb)
+			ret = cpu_op_cmpeqv_storev_mb_storev(targetptr_final,
+				offset, targetptr_spec, newval_spec,
+				newval_final, cpu);
+		else
+			ret = cpu_op_cmpeqv_storev_storev(targetptr_final,
+				offset, targetptr_spec, newval_spec,
+				newval_final, cpu);
+		if (likely(!ret))
+			break;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	return true;
+}
+
+struct percpu_buffer_node *percpu_buffer_pop(struct percpu_buffer *buffer)
+{
+	struct percpu_buffer_node *head;
+	intptr_t *targetptr, newval;
+	int cpu, ret;
+	intptr_t offset;
+
+#ifndef SKIP_FASTPATH
+	/* Try fast path. */
+	cpu = rseq_cpu_start();
+	/* Load offset with single-copy atomicity. */
+	offset = READ_ONCE(buffer->c[cpu].offset);
+	if (offset == 0) {
+		if (unlikely(cpu != rseq_current_cpu_raw()))
+			goto slowpath;
+		return NULL;
+	}
+	head = buffer->c[cpu].array[offset - 1];
+	newval = offset - 1;
+	targetptr = (intptr_t *)&buffer->c[cpu].offset;
+	ret = rseq_cmpeqv_cmpeqv_storev(targetptr, offset,
+		(intptr_t *)&buffer->c[cpu].array[offset - 1], (intptr_t)head,
+		newval, cpu);
+	if (likely(!ret))
+		return head;
+#endif
+slowpath:
+	__attribute__((unused));
+	/* Fallback on cpu_opv system call. */
+	for (;;) {
+		cpu = rseq_current_cpu();
+		/* Load offset with single-copy atomicity. */
+		offset = READ_ONCE(buffer->c[cpu].offset);
+		if (offset == 0)
+			return NULL;
+		head = buffer->c[cpu].array[offset - 1];
+		newval = offset - 1;
+		targetptr = (intptr_t *)&buffer->c[cpu].offset;
+		ret = cpu_op_cmpeqv_cmpeqv_storev(targetptr, offset,
+			(intptr_t *)&buffer->c[cpu].array[offset - 1],
+			(intptr_t)head, newval, cpu);
+		if (likely(!ret))
+			break;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	return head;
+}
+
+void *test_percpu_buffer_thread(void *arg)
+{
+	long long i, reps;
+	struct percpu_buffer *buffer = (struct percpu_buffer *)arg;
+
+	if (!opt_disable_rseq && rseq_register_current_thread())
+		abort();
+
+	reps = opt_reps;
+	for (i = 0; i < reps; i++) {
+		struct percpu_buffer_node *node = percpu_buffer_pop(buffer);
+
+		if (opt_yield)
+			sched_yield();  /* encourage shuffling */
+		if (node) {
+			if (!percpu_buffer_push(buffer, node)) {
+				/* Should increase buffer size. */
+				abort();
+			}
+		}
+	}
+
+	printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n",
+		(int) gettid(), nr_abort, signals_delivered);
+	if (!opt_disable_rseq && rseq_unregister_current_thread())
+		abort();
+
+	return NULL;
+}
+
+/* Simultaneous modification to a per-cpu buffer from many threads.  */
+void test_percpu_buffer(void)
+{
+	const int num_threads = opt_threads;
+	int i, j, ret;
+	uint64_t sum = 0, expected_sum = 0;
+	struct percpu_buffer buffer;
+	pthread_t test_threads[num_threads];
+	cpu_set_t allowed_cpus;
+
+	memset(&buffer, 0, sizeof(buffer));
+
+	/* Generate list entries for every usable cpu. */
+	sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus);
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		if (!CPU_ISSET(i, &allowed_cpus))
+			continue;
+		/* Worse-case is every item in same CPU. */
+		buffer.c[i].array =
+			malloc(sizeof(*buffer.c[i].array) * CPU_SETSIZE
+				* BUFFER_ITEM_PER_CPU);
+		assert(buffer.c[i].array);
+		buffer.c[i].buflen = CPU_SETSIZE * BUFFER_ITEM_PER_CPU;
+		for (j = 1; j <= BUFFER_ITEM_PER_CPU; j++) {
+			struct percpu_buffer_node *node;
+
+			expected_sum += j;
+
+			/*
+			 * We could theoretically put the word-sized
+			 * "data" directly in the buffer. However, we
+			 * want to model objects that would not fit
+			 * within a single word, so allocate an object
+			 * for each node.
+			 */
+			node = malloc(sizeof(*node));
+			assert(node);
+			node->data = j;
+			buffer.c[i].array[j - 1] = node;
+			buffer.c[i].offset++;
+		}
+	}
+
+	for (i = 0; i < num_threads; i++) {
+		ret = pthread_create(&test_threads[i], NULL,
+			test_percpu_buffer_thread, &buffer);
+		if (ret) {
+			errno = ret;
+			perror("pthread_create");
+			abort();
+		}
+	}
+
+	for (i = 0; i < num_threads; i++) {
+		pthread_join(test_threads[i], NULL);
+		if (ret) {
+			errno = ret;
+			perror("pthread_join");
+			abort();
+		}
+	}
+
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		cpu_set_t pin_mask;
+		struct percpu_buffer_node *node;
+
+		if (!CPU_ISSET(i, &allowed_cpus))
+			continue;
+
+		CPU_ZERO(&pin_mask);
+		CPU_SET(i, &pin_mask);
+		sched_setaffinity(0, sizeof(pin_mask), &pin_mask);
+
+		while ((node = percpu_buffer_pop(&buffer))) {
+			sum += node->data;
+			free(node);
+		}
+		free(buffer.c[i].array);
+	}
+
+	/*
+	 * All entries should now be accounted for (unless some external
+	 * actor is interfering with our allowed affinity while this
+	 * test is running).
+	 */
+	assert(sum == expected_sum);
+}
+
+bool percpu_memcpy_buffer_push(struct percpu_memcpy_buffer *buffer,
+		struct percpu_memcpy_buffer_node item)
+{
+	char *destptr, *srcptr;
+	size_t copylen;
+	intptr_t *targetptr_final, newval_final;
+	int cpu, ret;
+	intptr_t offset;
+
+#ifndef SKIP_FASTPATH
+	/* Try fast path. */
+	cpu = rseq_cpu_start();
+	/* Load offset with single-copy atomicity. */
+	offset = READ_ONCE(buffer->c[cpu].offset);
+	if (offset == buffer->c[cpu].buflen) {
+		if (unlikely(cpu != rseq_current_cpu_raw()))
+			goto slowpath;
+		return false;
+	}
+	destptr = (char *)&buffer->c[cpu].array[offset];
+	srcptr = (char *)&item;
+	copylen = sizeof(item);
+	newval_final = offset + 1;
+	targetptr_final = &buffer->c[cpu].offset;
+	if (opt_mb)
+		ret = rseq_cmpeqv_trymemcpy_storev_release(targetptr_final,
+			offset, destptr, srcptr, copylen,
+			newval_final, cpu);
+	else
+		ret = rseq_cmpeqv_trymemcpy_storev(targetptr_final,
+			offset, destptr, srcptr, copylen,
+			newval_final, cpu);
+	if (likely(!ret))
+		return true;
+#endif
+slowpath:
+	__attribute__((unused));
+	/* Fallback on cpu_opv system call. */
+	for (;;) {
+		cpu = rseq_current_cpu();
+		/* Load offset with single-copy atomicity. */
+		offset = READ_ONCE(buffer->c[cpu].offset);
+		if (offset == buffer->c[cpu].buflen)
+			return false;
+		destptr = (char *)&buffer->c[cpu].array[offset];
+		srcptr = (char *)&item;
+		copylen = sizeof(item);
+		newval_final = offset + 1;
+		targetptr_final = &buffer->c[cpu].offset;
+		/* copylen must be <= PAGE_SIZE. */
+		if (opt_mb)
+			ret = cpu_op_cmpeqv_memcpy_mb_storev(targetptr_final,
+				offset, destptr, srcptr, copylen,
+				newval_final, cpu);
+		else
+			ret = cpu_op_cmpeqv_memcpy_storev(targetptr_final,
+				offset, destptr, srcptr, copylen,
+				newval_final, cpu);
+		if (likely(!ret))
+			break;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	return true;
+}
+
+bool percpu_memcpy_buffer_pop(struct percpu_memcpy_buffer *buffer,
+		struct percpu_memcpy_buffer_node *item)
+{
+	char *destptr, *srcptr;
+	size_t copylen;
+	intptr_t *targetptr_final, newval_final;
+	int cpu, ret;
+	intptr_t offset;
+
+#ifndef SKIP_FASTPATH
+	/* Try fast path. */
+	cpu = rseq_cpu_start();
+	/* Load offset with single-copy atomicity. */
+	offset = READ_ONCE(buffer->c[cpu].offset);
+	if (offset == 0) {
+		if (unlikely(cpu != rseq_current_cpu_raw()))
+			goto slowpath;
+		return false;
+	}
+	destptr = (char *)item;
+	srcptr = (char *)&buffer->c[cpu].array[offset - 1];
+	copylen = sizeof(*item);
+	newval_final = offset - 1;
+	targetptr_final = &buffer->c[cpu].offset;
+	ret = rseq_cmpeqv_trymemcpy_storev(targetptr_final,
+		offset, destptr, srcptr, copylen,
+		newval_final, cpu);
+	if (likely(!ret))
+		return true;
+#endif
+slowpath:
+	__attribute__((unused));
+	/* Fallback on cpu_opv system call. */
+	for (;;) {
+		cpu = rseq_current_cpu();
+		/* Load offset with single-copy atomicity. */
+		offset = READ_ONCE(buffer->c[cpu].offset);
+		if (offset == 0)
+			return false;
+		destptr = (char *)item;
+		srcptr = (char *)&buffer->c[cpu].array[offset - 1];
+		copylen = sizeof(*item);
+		newval_final = offset - 1;
+		targetptr_final = &buffer->c[cpu].offset;
+		/* copylen must be <= PAGE_SIZE. */
+		ret = cpu_op_cmpeqv_memcpy_storev(targetptr_final,
+			offset, destptr, srcptr, copylen,
+			newval_final, cpu);
+		if (likely(!ret))
+			break;
+		assert(ret >= 0 || errno == EAGAIN);
+	}
+	return true;
+}
+
+void *test_percpu_memcpy_buffer_thread(void *arg)
+{
+	long long i, reps;
+	struct percpu_memcpy_buffer *buffer = (struct percpu_memcpy_buffer *)arg;
+
+	if (!opt_disable_rseq && rseq_register_current_thread())
+		abort();
+
+	reps = opt_reps;
+	for (i = 0; i < reps; i++) {
+		struct percpu_memcpy_buffer_node item;
+		bool result;
+
+		result = percpu_memcpy_buffer_pop(buffer, &item);
+		if (opt_yield)
+			sched_yield();  /* encourage shuffling */
+		if (result) {
+			if (!percpu_memcpy_buffer_push(buffer, item)) {
+				/* Should increase buffer size. */
+				abort();
+			}
+		}
+	}
+
+	printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n",
+		(int) gettid(), nr_abort, signals_delivered);
+	if (!opt_disable_rseq && rseq_unregister_current_thread())
+		abort();
+
+	return NULL;
+}
+
+/* Simultaneous modification to a per-cpu buffer from many threads.  */
+void test_percpu_memcpy_buffer(void)
+{
+	const int num_threads = opt_threads;
+	int i, j, ret;
+	uint64_t sum = 0, expected_sum = 0;
+	struct percpu_memcpy_buffer buffer;
+	pthread_t test_threads[num_threads];
+	cpu_set_t allowed_cpus;
+
+	memset(&buffer, 0, sizeof(buffer));
+
+	/* Generate list entries for every usable cpu. */
+	sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus);
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		if (!CPU_ISSET(i, &allowed_cpus))
+			continue;
+		/* Worse-case is every item in same CPU. */
+		buffer.c[i].array =
+			malloc(sizeof(*buffer.c[i].array) * CPU_SETSIZE
+				* MEMCPY_BUFFER_ITEM_PER_CPU);
+		assert(buffer.c[i].array);
+		buffer.c[i].buflen = CPU_SETSIZE * MEMCPY_BUFFER_ITEM_PER_CPU;
+		for (j = 1; j <= MEMCPY_BUFFER_ITEM_PER_CPU; j++) {
+			expected_sum += 2 * j + 1;
+
+			/*
+			 * We could theoretically put the word-sized
+			 * "data" directly in the buffer. However, we
+			 * want to model objects that would not fit
+			 * within a single word, so allocate an object
+			 * for each node.
+			 */
+			buffer.c[i].array[j - 1].data1 = j;
+			buffer.c[i].array[j - 1].data2 = j + 1;
+			buffer.c[i].offset++;
+		}
+	}
+
+	for (i = 0; i < num_threads; i++) {
+		ret = pthread_create(&test_threads[i], NULL,
+			test_percpu_memcpy_buffer_thread, &buffer);
+		if (ret) {
+			errno = ret;
+			perror("pthread_create");
+			abort();
+		}
+	}
+
+	for (i = 0; i < num_threads; i++) {
+		pthread_join(test_threads[i], NULL);
+		if (ret) {
+			errno = ret;
+			perror("pthread_join");
+			abort();
+		}
+	}
+
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		cpu_set_t pin_mask;
+		struct percpu_memcpy_buffer_node item;
+
+		if (!CPU_ISSET(i, &allowed_cpus))
+			continue;
+
+		CPU_ZERO(&pin_mask);
+		CPU_SET(i, &pin_mask);
+		sched_setaffinity(0, sizeof(pin_mask), &pin_mask);
+
+		while (percpu_memcpy_buffer_pop(&buffer, &item)) {
+			sum += item.data1;
+			sum += item.data2;
+		}
+		free(buffer.c[i].array);
+	}
+
+	/*
+	 * All entries should now be accounted for (unless some external
+	 * actor is interfering with our allowed affinity while this
+	 * test is running).
+	 */
+	assert(sum == expected_sum);
+}
+
+static void test_signal_interrupt_handler(int signo)
+{
+	signals_delivered++;
+}
+
+static int set_signal_handler(void)
+{
+	int ret = 0;
+	struct sigaction sa;
+	sigset_t sigset;
+
+	ret = sigemptyset(&sigset);
+	if (ret < 0) {
+		perror("sigemptyset");
+		return ret;
+	}
+
+	sa.sa_handler = test_signal_interrupt_handler;
+	sa.sa_mask = sigset;
+	sa.sa_flags = 0;
+	ret = sigaction(SIGUSR1, &sa, NULL);
+	if (ret < 0) {
+		perror("sigaction");
+		return ret;
+	}
+
+	printf_verbose("Signal handler set for SIGUSR1\n");
+
+	return ret;
+}
+
+static void show_usage(int argc, char **argv)
+{
+	printf("Usage : %s <OPTIONS>\n",
+		argv[0]);
+	printf("OPTIONS:\n");
+	printf("	[-1 loops] Number of loops for delay injection 1\n");
+	printf("	[-2 loops] Number of loops for delay injection 2\n");
+	printf("	[-3 loops] Number of loops for delay injection 3\n");
+	printf("	[-4 loops] Number of loops for delay injection 4\n");
+	printf("	[-5 loops] Number of loops for delay injection 5\n");
+	printf("	[-6 loops] Number of loops for delay injection 6\n");
+	printf("	[-7 loops] Number of loops for delay injection 7 (-1 to enable -m)\n");
+	printf("	[-8 loops] Number of loops for delay injection 8 (-1 to enable -m)\n");
+	printf("	[-9 loops] Number of loops for delay injection 9 (-1 to enable -m)\n");
+	printf("	[-m N] Yield/sleep/kill every modulo N (default 0: disabled) (>= 0)\n");
+	printf("	[-y] Yield\n");
+	printf("	[-k] Kill thread with signal\n");
+	printf("	[-s S] S: =0: disabled (default), >0: sleep time (ms)\n");
+	printf("	[-t N] Number of threads (default 200)\n");
+	printf("	[-r N] Number of repetitions per thread (default 5000)\n");
+	printf("	[-d] Disable rseq system call (no initialization)\n");
+	printf("	[-D M] Disable rseq for each M threads\n");
+	printf("	[-T test] Choose test: (s)pinlock, (l)ist, (b)uffer, (m)emcpy, (i)ncrement\n");
+	printf("	[-M] Push into buffer and memcpy buffer with memory barriers.\n");
+	printf("	[-q] Quiet output.\n");
+	printf("	[-h] Show this help.\n");
+	printf("\n");
+}
+
+int main(int argc, char **argv)
+{
+	int i;
+
+	for (i = 1; i < argc; i++) {
+		if (argv[i][0] != '-')
+			continue;
+		switch (argv[i][1]) {
+		case '1':
+		case '2':
+		case '3':
+		case '4':
+		case '5':
+		case '6':
+		case '7':
+		case '8':
+		case '9':
+			if (argc < i + 2) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			loop_cnt[argv[i][1] - '0'] = atol(argv[i + 1]);
+			i++;
+			break;
+		case 'm':
+			if (argc < i + 2) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			opt_modulo = atol(argv[i + 1]);
+			if (opt_modulo < 0) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			i++;
+			break;
+		case 's':
+			if (argc < i + 2) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			opt_sleep = atol(argv[i + 1]);
+			if (opt_sleep < 0) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			i++;
+			break;
+		case 'y':
+			opt_yield = 1;
+			break;
+		case 'k':
+			opt_signal = 1;
+			break;
+		case 'd':
+			opt_disable_rseq = 1;
+			break;
+		case 'D':
+			if (argc < i + 2) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			opt_disable_mod = atol(argv[i + 1]);
+			if (opt_disable_mod < 0) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			i++;
+			break;
+		case 't':
+			if (argc < i + 2) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			opt_threads = atol(argv[i + 1]);
+			if (opt_threads < 0) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			i++;
+			break;
+		case 'r':
+			if (argc < i + 2) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			opt_reps = atoll(argv[i + 1]);
+			if (opt_reps < 0) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			i++;
+			break;
+		case 'h':
+			show_usage(argc, argv);
+			goto end;
+		case 'T':
+			if (argc < i + 2) {
+				show_usage(argc, argv);
+				goto error;
+			}
+			opt_test = *argv[i + 1];
+			switch (opt_test) {
+			case 's':
+			case 'l':
+			case 'i':
+			case 'b':
+			case 'm':
+				break;
+			default:
+				show_usage(argc, argv);
+				goto error;
+			}
+			i++;
+			break;
+		case 'q':
+			quiet = 1;
+			break;
+		case 'M':
+			opt_mb = 1;
+			break;
+		default:
+			show_usage(argc, argv);
+			goto error;
+		}
+	}
+
+	if (set_signal_handler())
+		goto error;
+
+	if (!opt_disable_rseq && rseq_register_current_thread())
+		goto error;
+	switch (opt_test) {
+	case 's':
+		printf_verbose("spinlock\n");
+		test_percpu_spinlock();
+		break;
+	case 'l':
+		printf_verbose("linked list\n");
+		test_percpu_list();
+		break;
+	case 'b':
+		printf_verbose("buffer\n");
+		test_percpu_buffer();
+		break;
+	case 'm':
+		printf_verbose("memcpy buffer\n");
+		test_percpu_memcpy_buffer();
+		break;
+	case 'i':
+		printf_verbose("counter increment\n");
+		test_percpu_inc();
+		break;
+	}
+	if (!opt_disable_rseq && rseq_unregister_current_thread())
+		abort();
+end:
+	return 0;
+
+error:
+	return -1;
+}
diff --git a/tools/testing/selftests/rseq/rseq-arm.h b/tools/testing/selftests/rseq/rseq-arm.h
new file mode 100644
index 000000000000..d2e9f07d569a
--- /dev/null
+++ b/tools/testing/selftests/rseq/rseq-arm.h
@@ -0,0 +1,535 @@
+/*
+ * rseq-arm.h
+ *
+ * (C) Copyright 2016 - Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#define RSEQ_SIG	0x53053053
+
+#define rseq_smp_mb()	__asm__ __volatile__ ("dmb" : : : "memory")
+#define rseq_smp_rmb()	__asm__ __volatile__ ("dmb" : : : "memory")
+#define rseq_smp_wmb()	__asm__ __volatile__ ("dmb" : : : "memory")
+
+#define rseq_smp_load_acquire(p)					\
+__extension__ ({							\
+	__typeof(*p) ____p1 = RSEQ_READ_ONCE(*p);			\
+	rseq_smp_mb();							\
+	____p1;								\
+})
+
+#define rseq_smp_acquire__after_ctrl_dep()	rseq_smp_rmb()
+
+#define rseq_smp_store_release(p, v)					\
+do {									\
+	rseq_smp_mb();							\
+	WRITE_ONCE(*p, v);						\
+} while (0)
+
+#define RSEQ_ASM_DEFINE_TABLE(section, version, flags,			\
+			start_ip, post_commit_offset, abort_ip)		\
+		".pushsection " __rseq_str(section) ", \"aw\"\n\t"	\
+		".balign 32\n\t"					\
+		".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
+		".word " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \
+		".popsection\n\t"
+
+#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)		\
+		__rseq_str(label) ":\n\t"				\
+		RSEQ_INJECT_ASM(1)					\
+		"adr r0, " __rseq_str(cs_label) "\n\t"			\
+		"str r0, [%[" __rseq_str(rseq_cs) "]]\n\t"
+
+#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label)		\
+		RSEQ_INJECT_ASM(2)					\
+		"ldr r0, %[" __rseq_str(current_cpu_id) "]\n\t"	\
+		"cmp %[" __rseq_str(cpu_id) "], r0\n\t"		\
+		"bne " __rseq_str(label) "\n\t"
+
+#define RSEQ_ASM_DEFINE_ABORT(table_label, label, section, sig,		\
+			teardown, abort_label, version, flags, start_ip,\
+			post_commit_offset, abort_ip)			\
+		__rseq_str(table_label) ":\n\t" 			\
+		".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
+		".word " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \
+		".word " __rseq_str(RSEQ_SIG) "\n\t"			\
+		__rseq_str(label) ":\n\t"				\
+		teardown						\
+		"b %l[" __rseq_str(abort_label) "]\n\t"
+
+#define RSEQ_ASM_DEFINE_CMPFAIL(label, section, teardown, cmpfail_label) \
+		__rseq_str(label) ":\n\t"				\
+		teardown						\
+		"b %l[" __rseq_str(cmpfail_label) "]\n\t"
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"ldr r0, %[v]\n\t"
+		"cmp %[expect], r0\n\t"
+		"bne 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* final store */
+		"str %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(5)
+		"b 6f\n\t"
+		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG, "", abort,
+			0x0, 0x0, 1b, 2b-1b, 4f)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		"6:\n\t"
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"r"(&__rseq_abi.rseq_cs),
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "r0", "memory", "cc"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
+		off_t voffp, intptr_t *load, int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"ldr r0, %[v]\n\t"
+		"cmp %[expectnot], r0\n\t"
+		"beq 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		"str r0, %[load]\n\t"
+		"add r0, %[voffp]\n\t"
+		"ldr r0, [r0]\n\t"
+		/* final store */
+		"str r0, %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(5)
+		"b 6f\n\t"
+		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG, "", abort,
+			0x0, 0x0, 1b, 2b-1b, 4f)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		"6:\n\t"
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"r"(&__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expectnot]"r"(expectnot),
+		  [voffp]"Ir"(voffp),
+		  [load]"m"(*load)
+		  RSEQ_INJECT_INPUT
+		: "r0", "memory", "cc"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_addv(intptr_t *v, intptr_t count, int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"ldr r0, %[v]\n\t"
+		"add r0, %[count]\n\t"
+		/* final store */
+		"str r0, %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(4)
+		"b 6f\n\t"
+		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG, "", abort,
+			0x0, 0x0, 1b, 2b-1b, 4f)
+		"6:\n\t"
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"r"(&__rseq_abi.rseq_cs),
+		  [v]"m"(*v),
+		  [count]"Ir"(count)
+		  RSEQ_INJECT_INPUT
+		: "r0", "memory", "cc"
+		  RSEQ_INJECT_CLOBBER
+		: abort
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"ldr r0, %[v]\n\t"
+		"cmp %[expect], r0\n\t"
+		"bne 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try store */
+		"str %[newv2], %[v2]\n\t"
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		"str %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		"b 6f\n\t"
+		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG, "", abort,
+			0x0, 0x0, 1b, 2b-1b, 4f)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		"6:\n\t"
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"r"(&__rseq_abi.rseq_cs),
+		  /* try store input */
+		  [v2]"m"(*v2),
+		  [newv2]"r"(newv2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "r0", "memory", "cc"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"ldr r0, %[v]\n\t"
+		"cmp %[expect], r0\n\t"
+		"bne 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try store */
+		"str %[newv2], %[v2]\n\t"
+		RSEQ_INJECT_ASM(5)
+		"dmb\n\t"	/* full mb provides store-release */
+		/* final store */
+		"str %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		"b 6f\n\t"
+		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG, "", abort,
+			0x0, 0x0, 1b, 2b-1b, 4f)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		"6:\n\t"
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"r"(&__rseq_abi.rseq_cs),
+		  /* try store input */
+		  [v2]"m"(*v2),
+		  [newv2]"r"(newv2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "r0", "memory", "cc"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t expect2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"ldr r0, %[v]\n\t"
+		"cmp %[expect], r0\n\t"
+		"bne 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		"ldr r0, %[v2]\n\t"
+		"cmp %[expect2], r0\n\t"
+		"bne 5f\n\t"
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		"str %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		"b 6f\n\t"
+		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG, "", abort,
+			0x0, 0x0, 1b, 2b-1b, 4f)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		"6:\n\t"
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"r"(&__rseq_abi.rseq_cs),
+		  /* cmp2 input */
+		  [v2]"m"(*v2),
+		  [expect2]"r"(expect2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "r0", "memory", "cc"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	uint32_t rseq_scratch[3];
+
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		"str %[src], %[rseq_scratch0]\n\t"
+		"str %[dst], %[rseq_scratch1]\n\t"
+		"str %[len], %[rseq_scratch2]\n\t"
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"ldr r0, %[v]\n\t"
+		"cmp %[expect], r0\n\t"
+		"bne 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try memcpy */
+		"cmp %[len], #0\n\t" \
+		"beq 333f\n\t" \
+		"222:\n\t" \
+		"ldrb %%r0, [%[src]]\n\t" \
+		"strb %%r0, [%[dst]]\n\t" \
+		"adds %[src], #1\n\t" \
+		"adds %[dst], #1\n\t" \
+		"subs %[len], #1\n\t" \
+		"bne 222b\n\t" \
+		"333:\n\t" \
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		"str %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		/* teardown */
+		"ldr %[len], %[rseq_scratch2]\n\t"
+		"ldr %[dst], %[rseq_scratch1]\n\t"
+		"ldr %[src], %[rseq_scratch0]\n\t"
+		"b 6f\n\t"
+		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG,
+			/* teardown */
+			"ldr %[len], %[rseq_scratch2]\n\t"
+			"ldr %[dst], %[rseq_scratch1]\n\t"
+			"ldr %[src], %[rseq_scratch0]\n\t",
+			abort, 0x0, 0x0, 1b, 2b-1b, 4f)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure,
+			/* teardown */
+			"ldr %[len], %[rseq_scratch2]\n\t"
+			"ldr %[dst], %[rseq_scratch1]\n\t"
+			"ldr %[src], %[rseq_scratch0]\n\t",
+			cmpfail)
+		"6:\n\t"
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"r"(&__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv),
+		  /* try memcpy input */
+		  [dst]"r"(dst),
+		  [src]"r"(src),
+		  [len]"r"(len),
+		  [rseq_scratch0]"m"(rseq_scratch[0]),
+		  [rseq_scratch1]"m"(rseq_scratch[1]),
+		  [rseq_scratch2]"m"(rseq_scratch[2])
+		  RSEQ_INJECT_INPUT
+		: "r0", "memory", "cc"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	uint32_t rseq_scratch[3];
+
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		"str %[src], %[rseq_scratch0]\n\t"
+		"str %[dst], %[rseq_scratch1]\n\t"
+		"str %[len], %[rseq_scratch2]\n\t"
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"ldr r0, %[v]\n\t"
+		"cmp %[expect], r0\n\t"
+		"bne 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try memcpy */
+		"cmp %[len], #0\n\t" \
+		"beq 333f\n\t" \
+		"222:\n\t" \
+		"ldrb %%r0, [%[src]]\n\t" \
+		"strb %%r0, [%[dst]]\n\t" \
+		"adds %[src], #1\n\t" \
+		"adds %[dst], #1\n\t" \
+		"subs %[len], #1\n\t" \
+		"bne 222b\n\t" \
+		"333:\n\t" \
+		RSEQ_INJECT_ASM(5)
+		"dmb\n\t"	/* full mb provides store-release */
+		/* final store */
+		"str %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		/* teardown */
+		"ldr %[len], %[rseq_scratch2]\n\t"
+		"ldr %[dst], %[rseq_scratch1]\n\t"
+		"ldr %[src], %[rseq_scratch0]\n\t"
+		"b 6f\n\t"
+		RSEQ_ASM_DEFINE_ABORT(3, 4, __rseq_failure, RSEQ_SIG,
+			/* teardown */
+			"ldr %[len], %[rseq_scratch2]\n\t"
+			"ldr %[dst], %[rseq_scratch1]\n\t"
+			"ldr %[src], %[rseq_scratch0]\n\t",
+			abort, 0x0, 0x0, 1b, 2b-1b, 4f)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure,
+			/* teardown */
+			"ldr %[len], %[rseq_scratch2]\n\t"
+			"ldr %[dst], %[rseq_scratch1]\n\t"
+			"ldr %[src], %[rseq_scratch0]\n\t",
+			cmpfail)
+		"6:\n\t"
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"r"(&__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv),
+		  /* try memcpy input */
+		  [dst]"r"(dst),
+		  [src]"r"(src),
+		  [len]"r"(len),
+		  [rseq_scratch0]"m"(rseq_scratch[0]),
+		  [rseq_scratch1]"m"(rseq_scratch[1]),
+		  [rseq_scratch2]"m"(rseq_scratch[2])
+		  RSEQ_INJECT_INPUT
+		: "r0", "memory", "cc"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
diff --git a/tools/testing/selftests/rseq/rseq-ppc.h b/tools/testing/selftests/rseq/rseq-ppc.h
new file mode 100644
index 000000000000..bff0d97db0ff
--- /dev/null
+++ b/tools/testing/selftests/rseq/rseq-ppc.h
@@ -0,0 +1,567 @@
+/*
+ * rseq-ppc.h
+ *
+ * (C) Copyright 2016 - Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
+ * (C) Copyright 2016 - Boqun Feng <boqun.feng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#define RSEQ_SIG	0x53053053
+
+#define rseq_smp_mb()		__asm__ __volatile__ ("sync" : : : "memory")
+#define rseq_smp_lwsync()	__asm__ __volatile__ ("lwsync" : : : "memory")
+#define rseq_smp_rmb()		rseq_smp_lwsync()
+#define rseq_smp_wmb()		rseq_smp_lwsync()
+
+#define rseq_smp_load_acquire(p)					\
+__extension__ ({							\
+	__typeof(*p) ____p1 = RSEQ_READ_ONCE(*p);			\
+	rseq_smp_lwsync();						\
+	____p1;								\
+})
+
+#define rseq_smp_acquire__after_ctrl_dep()	rseq_smp_lwsync()
+
+#define rseq_smp_store_release(p, v)					\
+do {									\
+	rseq_smp_lwsync();						\
+	RSEQ_WRITE_ONCE(*p, v);						\
+} while (0)
+
+/*
+ * The __rseq_table section can be used by debuggers to better handle
+ * single-stepping through the restartable critical sections.
+ */
+
+#ifdef __PPC64__
+
+#define STORE_WORD	"std "
+#define LOAD_WORD	"ld "
+#define LOADX_WORD	"ldx "
+#define CMP_WORD	"cmpd "
+
+#define RSEQ_ASM_DEFINE_TABLE(label, section, version, flags,			\
+			start_ip, post_commit_offset, abort_ip)			\
+		".pushsection " __rseq_str(section) ", \"aw\"\n\t"		\
+		".balign 32\n\t"						\
+		__rseq_str(label) ":\n\t"					\
+		".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t"	\
+		".quad " __rseq_str(start_ip) ", " __rseq_str(post_commit_offset) ", " __rseq_str(abort_ip) "\n\t" \
+		".popsection\n\t"
+
+#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)			\
+		__rseq_str(label) ":\n\t"					\
+		RSEQ_INJECT_ASM(1)						\
+		"lis %%r17, (" __rseq_str(cs_label) ")@highest\n\t"		\
+		"ori %%r17, %%r17, (" __rseq_str(cs_label) ")@higher\n\t"	\
+		"rldicr %%r17, %%r17, 32, 31\n\t"				\
+		"oris %%r17, %%r17, (" __rseq_str(cs_label) ")@high\n\t"	\
+		"ori %%r17, %%r17, (" __rseq_str(cs_label) ")@l\n\t"		\
+		"std %%r17, %[" __rseq_str(rseq_cs) "]\n\t"
+
+#else /* #ifdef __PPC64__ */
+
+#define STORE_WORD	"stw "
+#define LOAD_WORD	"lwz "
+#define LOADX_WORD	"lwzx "
+#define CMP_WORD	"cmpw "
+
+#define RSEQ_ASM_DEFINE_TABLE(label, section, version, flags,			\
+			start_ip, post_commit_offset, abort_ip)			\
+		".pushsection " __rseq_str(section) ", \"aw\"\n\t"		\
+		".balign 32\n\t"						\
+		__rseq_str(label) ":\n\t"					\
+		".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t"	\
+		/* 32-bit only supported on BE */				\
+		".long 0x0, " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) "\n\t" \
+		".popsection\n\t"
+
+#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)			\
+		__rseq_str(label) ":\n\t"					\
+		RSEQ_INJECT_ASM(1)						\
+		"lis %%r17, (" __rseq_str(cs_label) ")@ha\n\t"			\
+		"addi %%r17, %%r17, (" __rseq_str(cs_label) ")@l\n\t"		\
+		"stw %%r17, %[" __rseq_str(rseq_cs) "]\n\t"
+
+#endif /* #ifdef __PPC64__ */
+
+#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label)			\
+		RSEQ_INJECT_ASM(2)						\
+		"lwz %%r17, %[" __rseq_str(current_cpu_id) "]\n\t"		\
+		"cmpw cr7, %[" __rseq_str(cpu_id) "], %%r17\n\t"		\
+		"bne- cr7, " __rseq_str(label) "\n\t"
+
+#define RSEQ_ASM_DEFINE_ABORT(label, section, sig, teardown, abort_label)	\
+		".pushsection " __rseq_str(section) ", \"ax\"\n\t"		\
+		".long " __rseq_str(sig) "\n\t"					\
+		__rseq_str(label) ":\n\t"					\
+		teardown							\
+		"b %l[" __rseq_str(abort_label) "]\n\t"			\
+		".popsection\n\t"
+
+#define RSEQ_ASM_DEFINE_CMPFAIL(label, section, teardown, cmpfail_label)	\
+		".pushsection " __rseq_str(section) ", \"ax\"\n\t"		\
+		__rseq_str(label) ":\n\t"					\
+		teardown							\
+		"b %l[" __rseq_str(cmpfail_label) "]\n\t"			\
+		".popsection\n\t"
+
+
+/*
+ * RSEQ_ASM_OPs: asm operations for rseq
+ * 	RSEQ_ASM_OP_R_*: has hard-code registers in it
+ * 	RSEQ_ASM_OP_* (else): doesn't have hard-code registers(unless cr7)
+ */
+#define RSEQ_ASM_OP_CMPEQ(var, expect, label)					\
+		LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t"			\
+		CMP_WORD "cr7, %%r17, %[" __rseq_str(expect) "]\n\t"		\
+		"bne- cr7, " __rseq_str(label) "\n\t"
+
+#define RSEQ_ASM_OP_CMPNE(var, expectnot, label)				\
+		LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t"			\
+		CMP_WORD "cr7, %%r17, %[" __rseq_str(expectnot) "]\n\t"	\
+		"beq- cr7, " __rseq_str(label) "\n\t"
+
+#define RSEQ_ASM_OP_STORE(value, var)						\
+		STORE_WORD "%[" __rseq_str(value) "], %[" __rseq_str(var) "]\n\t"
+
+/* Load @var to r17 */
+#define RSEQ_ASM_OP_R_LOAD(var)							\
+		LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t"
+
+/* Store r17 to @var */
+#define RSEQ_ASM_OP_R_STORE(var)						\
+		STORE_WORD "%%r17, %[" __rseq_str(var) "]\n\t"
+
+/* Add @count to r17 */
+#define RSEQ_ASM_OP_R_ADD(count)						\
+		"add %%r17, %[" __rseq_str(count) "], %%r17\n\t"
+
+/* Load (r17 + voffp) to r17 */
+#define RSEQ_ASM_OP_R_LOADX(voffp)						\
+		LOADX_WORD "%%r17, %[" __rseq_str(voffp) "], %%r17\n\t"
+
+/* TODO: implement a faster memcpy. */
+#define RSEQ_ASM_OP_R_MEMCPY() \
+		"cmpdi %%r19, 0\n\t" \
+		"beq 333f\n\t" \
+		"addi %%r20, %%r20, -1\n\t" \
+		"addi %%r21, %%r21, -1\n\t" \
+		"222:\n\t" \
+		"lbzu %%r18, 1(%%r20)\n\t" \
+		"stbu %%r18, 1(%%r21)\n\t" \
+		"addi %%r19, %%r19, -1\n\t" \
+		"cmpdi %%r19, 0\n\t" \
+		"bne 222b\n\t" \
+		"333:\n\t" \
+
+#define RSEQ_ASM_OP_R_FINAL_STORE(var, post_commit_label)			\
+		STORE_WORD "%%r17, %[" __rseq_str(var) "]\n\t"			\
+		__rseq_str(post_commit_label) ":\n\t"
+
+#define RSEQ_ASM_OP_FINAL_STORE(value, var, post_commit_label)			\
+		STORE_WORD "%[" __rseq_str(value) "], %[" __rseq_str(var) "]\n\t"	\
+		__rseq_str(post_commit_label) ":\n\t"
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		/* cmp cpuid */
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* cmp @v equal to @expect */
+		RSEQ_ASM_OP_CMPEQ(v, expect, 5f)
+		RSEQ_INJECT_ASM(4)
+		/* final store */
+		RSEQ_ASM_OP_FINAL_STORE(newv, v, 2)
+		RSEQ_INJECT_ASM(5)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "r17"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
+		off_t voffp, intptr_t *load, int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		/* cmp cpuid */
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* cmp @v not equal to @expectnot */
+		RSEQ_ASM_OP_CMPNE(v, expectnot, 5f)
+		RSEQ_INJECT_ASM(4)
+		/* load the value of @v */
+		RSEQ_ASM_OP_R_LOAD(v)
+		/* store it in @load */
+		RSEQ_ASM_OP_R_STORE(load)
+		/* dereference voffp(v) */
+		RSEQ_ASM_OP_R_LOADX(voffp)
+		/* final store the value at voffp(v) */
+		RSEQ_ASM_OP_R_FINAL_STORE(v, 2)
+		RSEQ_INJECT_ASM(5)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expectnot]"r"(expectnot),
+		  [voffp]"b"(voffp),
+		  [load]"m"(*load)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "r17"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_addv(intptr_t *v, intptr_t count, int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		/* cmp cpuid */
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* load the value of @v */
+		RSEQ_ASM_OP_R_LOAD(v)
+		/* add @count to it */
+		RSEQ_ASM_OP_R_ADD(count)
+		/* final store */
+		RSEQ_ASM_OP_R_FINAL_STORE(v, 2)
+		RSEQ_INJECT_ASM(4)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [count]"r"(count)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "r17"
+		  RSEQ_INJECT_CLOBBER
+		: abort
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		/* cmp cpuid */
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* cmp @v equal to @expect */
+		RSEQ_ASM_OP_CMPEQ(v, expect, 5f)
+		RSEQ_INJECT_ASM(4)
+		/* try store */
+		RSEQ_ASM_OP_STORE(newv2, v2)
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		RSEQ_ASM_OP_FINAL_STORE(newv, v, 2)
+		RSEQ_INJECT_ASM(6)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* try store input */
+		  [v2]"m"(*v2),
+		  [newv2]"r"(newv2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "r17"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		/* cmp cpuid */
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* cmp @v equal to @expect */
+		RSEQ_ASM_OP_CMPEQ(v, expect, 5f)
+		RSEQ_INJECT_ASM(4)
+		/* try store */
+		RSEQ_ASM_OP_STORE(newv2, v2)
+		RSEQ_INJECT_ASM(5)
+		/* for 'release' */
+		"lwsync\n\t"
+		/* final store */
+		RSEQ_ASM_OP_FINAL_STORE(newv, v, 2)
+		RSEQ_INJECT_ASM(6)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* try store input */
+		  [v2]"m"(*v2),
+		  [newv2]"r"(newv2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "r17"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t expect2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		/* cmp cpuid */
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* cmp @v equal to @expect */
+		RSEQ_ASM_OP_CMPEQ(v, expect, 5f)
+		RSEQ_INJECT_ASM(4)
+		/* cmp @v2 equal to @expct2 */
+		RSEQ_ASM_OP_CMPEQ(v2, expect2, 5f)
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		RSEQ_ASM_OP_FINAL_STORE(newv, v, 2)
+		RSEQ_INJECT_ASM(6)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* cmp2 input */
+		  [v2]"m"(*v2),
+		  [expect2]"r"(expect2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "r17"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		/* setup for mempcy */
+		"mr %%r19, %[len]\n\t" \
+		"mr %%r20, %[src]\n\t" \
+		"mr %%r21, %[dst]\n\t" \
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		/* cmp cpuid */
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* cmp @v equal to @expect */
+		RSEQ_ASM_OP_CMPEQ(v, expect, 5f)
+		RSEQ_INJECT_ASM(4)
+		/* try memcpy */
+		RSEQ_ASM_OP_R_MEMCPY()
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		RSEQ_ASM_OP_FINAL_STORE(newv, v, 2)
+		RSEQ_INJECT_ASM(6)
+		/* teardown */
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv),
+		  /* try memcpy input */
+		  [dst]"r"(dst),
+		  [src]"r"(src),
+		  [len]"r"(len)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "r17", "r18", "r19", "r20", "r21"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		/* setup for mempcy */
+		"mr %%r19, %[len]\n\t" \
+		"mr %%r20, %[src]\n\t" \
+		"mr %%r21, %[dst]\n\t" \
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		/* cmp cpuid */
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* cmp @v equal to @expect */
+		RSEQ_ASM_OP_CMPEQ(v, expect, 5f)
+		RSEQ_INJECT_ASM(4)
+		/* try memcpy */
+		RSEQ_ASM_OP_R_MEMCPY()
+		RSEQ_INJECT_ASM(5)
+		/* for 'release' */
+		"lwsync\n\t"
+		/* final store */
+		RSEQ_ASM_OP_FINAL_STORE(newv, v, 2)
+		RSEQ_INJECT_ASM(6)
+		/* teardown */
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv),
+		  /* try memcpy input */
+		  [dst]"r"(dst),
+		  [src]"r"(src),
+		  [len]"r"(len)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "r17", "r18", "r19", "r20", "r21"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+#undef STORE_WORD
+#undef LOAD_WORD
+#undef LOADX_WORD
+#undef CMP_WORD
diff --git a/tools/testing/selftests/rseq/rseq-x86.h b/tools/testing/selftests/rseq/rseq-x86.h
new file mode 100644
index 000000000000..7e4c21751c52
--- /dev/null
+++ b/tools/testing/selftests/rseq/rseq-x86.h
@@ -0,0 +1,898 @@
+/*
+ * rseq-x86.h
+ *
+ * (C) Copyright 2016 - Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <stdint.h>
+
+#define RSEQ_SIG	0x53053053
+
+#ifdef __x86_64__
+
+#define rseq_smp_mb()	__asm__ __volatile__ ("mfence" : : : "memory")
+#define rseq_smp_rmb()	barrier()
+#define rseq_smp_wmb()	barrier()
+
+#define rseq_smp_load_acquire(p)					\
+__extension__ ({							\
+	__typeof(*p) ____p1 = RSEQ_READ_ONCE(*p);			\
+	barrier();							\
+	____p1;								\
+})
+
+#define rseq_smp_acquire__after_ctrl_dep()	rseq_smp_rmb()
+
+#define rseq_smp_store_release(p, v)					\
+do {									\
+	barrier();							\
+	RSEQ_WRITE_ONCE(*p, v);						\
+} while (0)
+
+#define RSEQ_ASM_DEFINE_TABLE(label, section, version, flags,		\
+			start_ip, post_commit_offset, abort_ip)		\
+		".pushsection " __rseq_str(section) ", \"aw\"\n\t"	\
+		".balign 32\n\t"					\
+		__rseq_str(label) ":\n\t"				\
+		".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
+		".quad " __rseq_str(start_ip) ", " __rseq_str(post_commit_offset) ", " __rseq_str(abort_ip) "\n\t" \
+		".popsection\n\t"
+
+#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)		\
+		__rseq_str(label) ":\n\t"				\
+		RSEQ_INJECT_ASM(1)					\
+		"leaq " __rseq_str(cs_label) "(%%rip), %%rax\n\t"	\
+		"movq %%rax, %[" __rseq_str(rseq_cs) "]\n\t"
+
+#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label)		\
+		RSEQ_INJECT_ASM(2)					\
+		"cmpl %[" __rseq_str(cpu_id) "], %[" __rseq_str(current_cpu_id) "]\n\t" \
+		"jnz " __rseq_str(label) "\n\t"
+
+#define RSEQ_ASM_DEFINE_ABORT(label, section, sig, teardown, abort_label) \
+		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
+		/* Disassembler-friendly signature: nopl <sig>(%rip). */\
+		".byte 0x0f, 0x1f, 0x05\n\t"				\
+		".long " __rseq_str(sig) "\n\t"			\
+		__rseq_str(label) ":\n\t"				\
+		teardown						\
+		"jmp %l[" __rseq_str(abort_label) "]\n\t"		\
+		".popsection\n\t"
+
+#define RSEQ_ASM_DEFINE_CMPFAIL(label, section, teardown, cmpfail_label) \
+		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
+		__rseq_str(label) ":\n\t"				\
+		teardown						\
+		"jmp %l[" __rseq_str(cmpfail_label) "]\n\t"		\
+		".popsection\n\t"
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"cmpq %[v], %[expect]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* final store */
+		"movq %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(5)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "rax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
+		off_t voffp, intptr_t *load, int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"cmpq %[v], %[expectnot]\n\t"
+		"jz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		"movq %[v], %%rax\n\t"
+		"movq %%rax, %[load]\n\t"
+		"addq %[voffp], %%rax\n\t"
+		"movq (%%rax), %%rax\n\t"
+		/* final store */
+		"movq %%rax, %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(5)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expectnot]"r"(expectnot),
+		  [voffp]"er"(voffp),
+		  [load]"m"(*load)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "rax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_addv(intptr_t *v, intptr_t count, int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* final store */
+		"addq %[count], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(4)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [count]"er"(count)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "rax"
+		  RSEQ_INJECT_CLOBBER
+		: abort
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"cmpq %[v], %[expect]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try store */
+		"movq %[newv2], %[v2]\n\t"
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		"movq %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* try store input */
+		  [v2]"m"(*v2),
+		  [newv2]"r"(newv2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "rax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+/* x86-64 is TSO. */
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	return rseq_cmpeqv_trystorev_storev(v, expect, v2, newv2,
+			newv, cpu);
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t expect2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"cmpq %[v], %[expect]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		"cmpq %[v2], %[expect2]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		"movq %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* cmp2 input */
+		  [v2]"m"(*v2),
+		  [expect2]"r"(expect2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "rax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	uint64_t rseq_scratch[3];
+
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		"movq %[src], %[rseq_scratch0]\n\t"
+		"movq %[dst], %[rseq_scratch1]\n\t"
+		"movq %[len], %[rseq_scratch2]\n\t"
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"cmpq %[v], %[expect]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try memcpy */
+		"test %[len], %[len]\n\t" \
+		"jz 333f\n\t" \
+		"222:\n\t" \
+		"movb (%[src]), %%al\n\t" \
+		"movb %%al, (%[dst])\n\t" \
+		"inc %[src]\n\t" \
+		"inc %[dst]\n\t" \
+		"dec %[len]\n\t" \
+		"jnz 222b\n\t" \
+		"333:\n\t" \
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		"movq %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		/* teardown */
+		"movq %[rseq_scratch2], %[len]\n\t"
+		"movq %[rseq_scratch1], %[dst]\n\t"
+		"movq %[rseq_scratch0], %[src]\n\t"
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG,
+			"movq %[rseq_scratch2], %[len]\n\t"
+			"movq %[rseq_scratch1], %[dst]\n\t"
+			"movq %[rseq_scratch0], %[src]\n\t",
+			abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure,
+			"movq %[rseq_scratch2], %[len]\n\t"
+			"movq %[rseq_scratch1], %[dst]\n\t"
+			"movq %[rseq_scratch0], %[src]\n\t",
+			cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv),
+		  /* try memcpy input */
+		  [dst]"r"(dst),
+		  [src]"r"(src),
+		  [len]"r"(len),
+		  [rseq_scratch0]"m"(rseq_scratch[0]),
+		  [rseq_scratch1]"m"(rseq_scratch[1]),
+		  [rseq_scratch2]"m"(rseq_scratch[2])
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "rax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+/* x86-64 is TSO. */
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	return rseq_cmpeqv_trymemcpy_storev(v, expect, dst, src,
+			len, newv, cpu);
+}
+
+#elif __i386__
+
+/*
+ * Support older 32-bit architectures that do not implement fence
+ * instructions.
+ */
+#define rseq_smp_mb()	\
+	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
+#define rseq_smp_rmb()	\
+	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
+#define rseq_smp_wmb()	\
+	__asm__ __volatile__ ("lock; addl $0,0(%%esp)" : : : "memory")
+
+#define rseq_smp_load_acquire(p)					\
+__extension__ ({							\
+	__typeof(*p) ____p1 = RSEQ_READ_ONCE(*p);			\
+	rseq_smp_mb();							\
+	____p1;								\
+})
+
+#define rseq_smp_acquire__after_ctrl_dep()	rseq_smp_rmb()
+
+#define rseq_smp_store_release(p, v)					\
+do {									\
+	rseq_smp_mb();							\
+	RSEQ_WRITE_ONCE(*p, v);						\
+} while (0)
+
+/*
+ * Use eax as scratch register and take memory operands as input to
+ * lessen register pressure. Especially needed when compiling in O0.
+ */
+#define RSEQ_ASM_DEFINE_TABLE(label, section, version, flags,		\
+			start_ip, post_commit_offset, abort_ip)		\
+		".pushsection " __rseq_str(section) ", \"aw\"\n\t"	\
+		".balign 32\n\t"					\
+		__rseq_str(label) ":\n\t"				\
+		".long " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
+		".long " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \
+		".popsection\n\t"
+
+#define RSEQ_ASM_STORE_RSEQ_CS(label, cs_label, rseq_cs)		\
+		__rseq_str(label) ":\n\t"				\
+		RSEQ_INJECT_ASM(1)					\
+		"movl $" __rseq_str(cs_label) ", %[rseq_cs]\n\t"
+
+#define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label)		\
+		RSEQ_INJECT_ASM(2)					\
+		"cmpl %[" __rseq_str(cpu_id) "], %[" __rseq_str(current_cpu_id) "]\n\t" \
+		"jnz " __rseq_str(label) "\n\t"
+
+#define RSEQ_ASM_DEFINE_ABORT(label, section, sig, teardown, abort_label) \
+		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
+		/* Disassembler-friendly signature: nopl <sig>. */\
+		".byte 0x0f, 0x1f, 0x05\n\t"				\
+		".long " __rseq_str(sig) "\n\t"			\
+		__rseq_str(label) ":\n\t"				\
+		teardown						\
+		"jmp %l[" __rseq_str(abort_label) "]\n\t"		\
+		".popsection\n\t"
+
+#define RSEQ_ASM_DEFINE_CMPFAIL(label, section, teardown, cmpfail_label) \
+		".pushsection " __rseq_str(section) ", \"ax\"\n\t"	\
+		__rseq_str(label) ":\n\t"				\
+		teardown						\
+		"jmp %l[" __rseq_str(cmpfail_label) "]\n\t"		\
+		".popsection\n\t"
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"cmpl %[v], %[expect]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* final store */
+		"movl %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(5)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "eax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
+		off_t voffp, intptr_t *load, int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"cmpl %[v], %[expectnot]\n\t"
+		"jz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		"movl %[v], %%eax\n\t"
+		"movl %%eax, %[load]\n\t"
+		"addl %[voffp], %%eax\n\t"
+		"movl (%%eax), %%eax\n\t"
+		/* final store */
+		"movl %%eax, %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(5)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expectnot]"r"(expectnot),
+		  [voffp]"ir"(voffp),
+		  [load]"m"(*load)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "eax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_addv(intptr_t *v, intptr_t count, int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		/* final store */
+		"addl %[count], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(4)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [count]"ir"(count)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "eax"
+		  RSEQ_INJECT_CLOBBER
+		: abort
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"cmpl %[v], %[expect]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try store */
+		"movl %[newv2], %%eax\n\t"
+		"movl %%eax, %[v2]\n\t"
+		RSEQ_INJECT_ASM(5)
+		/* final store */
+		"movl %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* try store input */
+		  [v2]"m"(*v2),
+		  [newv2]"m"(newv2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "eax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t newv2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"movl %[expect], %%eax\n\t"
+		"cmpl %[v], %%eax\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try store */
+		"movl %[newv2], %[v2]\n\t"
+		RSEQ_INJECT_ASM(5)
+		"lock; addl $0,0(%%esp)\n\t"
+		/* final store */
+		"movl %[newv], %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* try store input */
+		  [v2]"m"(*v2),
+		  [newv2]"r"(newv2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"m"(expect),
+		  [newv]"r"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "eax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
+		intptr_t *v2, intptr_t expect2, intptr_t newv,
+		int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"cmpl %[v], %[expect]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		"cmpl %[expect2], %[v2]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(5)
+		"movl %[newv], %%eax\n\t"
+		/* final store */
+		"movl %%eax, %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG, "", abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure, "", cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* cmp2 input */
+		  [v2]"m"(*v2),
+		  [expect2]"r"(expect2),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"r"(expect),
+		  [newv]"m"(newv)
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "eax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+/* TODO: implement a faster memcpy. */
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	uint32_t rseq_scratch[3];
+
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		"movl %[src], %[rseq_scratch0]\n\t"
+		"movl %[dst], %[rseq_scratch1]\n\t"
+		"movl %[len], %[rseq_scratch2]\n\t"
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"movl %[expect], %%eax\n\t"
+		"cmpl %%eax, %[v]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try memcpy */
+		"test %[len], %[len]\n\t" \
+		"jz 333f\n\t" \
+		"222:\n\t" \
+		"movb (%[src]), %%al\n\t" \
+		"movb %%al, (%[dst])\n\t" \
+		"inc %[src]\n\t" \
+		"inc %[dst]\n\t" \
+		"dec %[len]\n\t" \
+		"jnz 222b\n\t" \
+		"333:\n\t" \
+		RSEQ_INJECT_ASM(5)
+		"movl %[newv], %%eax\n\t"
+		/* final store */
+		"movl %%eax, %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		/* teardown */
+		"movl %[rseq_scratch2], %[len]\n\t"
+		"movl %[rseq_scratch1], %[dst]\n\t"
+		"movl %[rseq_scratch0], %[src]\n\t"
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG,
+			"movl %[rseq_scratch2], %[len]\n\t"
+			"movl %[rseq_scratch1], %[dst]\n\t"
+			"movl %[rseq_scratch0], %[src]\n\t",
+			abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure,
+			"movl %[rseq_scratch2], %[len]\n\t"
+			"movl %[rseq_scratch1], %[dst]\n\t"
+			"movl %[rseq_scratch0], %[src]\n\t",
+			cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"m"(expect),
+		  [newv]"m"(newv),
+		  /* try memcpy input */
+		  [dst]"r"(dst),
+		  [src]"r"(src),
+		  [len]"r"(len),
+		  [rseq_scratch0]"m"(rseq_scratch[0]),
+		  [rseq_scratch1]"m"(rseq_scratch[1]),
+		  [rseq_scratch2]"m"(rseq_scratch[2])
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "eax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+/* TODO: implement a faster memcpy. */
+static inline __attribute__((always_inline))
+int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
+		void *dst, void *src, size_t len, intptr_t newv,
+		int cpu)
+{
+	uint32_t rseq_scratch[3];
+
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, __rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
+		"movl %[src], %[rseq_scratch0]\n\t"
+		"movl %[dst], %[rseq_scratch1]\n\t"
+		"movl %[len], %[rseq_scratch2]\n\t"
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, rseq_cs)
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, 4f)
+		RSEQ_INJECT_ASM(3)
+		"movl %[expect], %%eax\n\t"
+		"cmpl %%eax, %[v]\n\t"
+		"jnz 5f\n\t"
+		RSEQ_INJECT_ASM(4)
+		/* try memcpy */
+		"test %[len], %[len]\n\t" \
+		"jz 333f\n\t" \
+		"222:\n\t" \
+		"movb (%[src]), %%al\n\t" \
+		"movb %%al, (%[dst])\n\t" \
+		"inc %[src]\n\t" \
+		"inc %[dst]\n\t" \
+		"dec %[len]\n\t" \
+		"jnz 222b\n\t" \
+		"333:\n\t" \
+		RSEQ_INJECT_ASM(5)
+		"lock; addl $0,0(%%esp)\n\t"
+		"movl %[newv], %%eax\n\t"
+		/* final store */
+		"movl %%eax, %[v]\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(6)
+		/* teardown */
+		"movl %[rseq_scratch2], %[len]\n\t"
+		"movl %[rseq_scratch1], %[dst]\n\t"
+		"movl %[rseq_scratch0], %[src]\n\t"
+		RSEQ_ASM_DEFINE_ABORT(4, __rseq_failure, RSEQ_SIG,
+			"movl %[rseq_scratch2], %[len]\n\t"
+			"movl %[rseq_scratch1], %[dst]\n\t"
+			"movl %[rseq_scratch0], %[src]\n\t",
+			abort)
+		RSEQ_ASM_DEFINE_CMPFAIL(5, __rseq_failure,
+			"movl %[rseq_scratch2], %[len]\n\t"
+			"movl %[rseq_scratch1], %[dst]\n\t"
+			"movl %[rseq_scratch0], %[src]\n\t",
+			cmpfail)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]"r"(cpu),
+		  [current_cpu_id]"m"(__rseq_abi.cpu_id),
+		  [rseq_cs]"m"(__rseq_abi.rseq_cs),
+		  /* final store input */
+		  [v]"m"(*v),
+		  [expect]"m"(expect),
+		  [newv]"m"(newv),
+		  /* try memcpy input */
+		  [dst]"r"(dst),
+		  [src]"r"(src),
+		  [len]"r"(len),
+		  [rseq_scratch0]"m"(rseq_scratch[0]),
+		  [rseq_scratch1]"m"(rseq_scratch[1]),
+		  [rseq_scratch2]"m"(rseq_scratch[2])
+		  RSEQ_INJECT_INPUT
+		: "memory", "cc", "eax"
+		  RSEQ_INJECT_CLOBBER
+		: abort, cmpfail
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+cmpfail:
+	return 1;
+}
+
+#endif
diff --git a/tools/testing/selftests/rseq/rseq.c b/tools/testing/selftests/rseq/rseq.c
new file mode 100644
index 000000000000..3db193c0afb0
--- /dev/null
+++ b/tools/testing/selftests/rseq/rseq.c
@@ -0,0 +1,116 @@
+/*
+ * rseq.c
+ *
+ * Copyright (C) 2016 Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; only
+ * version 2.1 of the License.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <sched.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <syscall.h>
+#include <assert.h>
+#include <signal.h>
+
+#include "rseq.h"
+
+#define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
+
+__attribute__((tls_model("initial-exec"))) __thread
+volatile struct rseq __rseq_abi = {
+	.cpu_id = -1,
+};
+
+static __attribute__((tls_model("initial-exec"))) __thread
+volatile int refcount;
+
+static void signal_off_save(sigset_t *oldset)
+{
+	sigset_t set;
+	int ret;
+
+	sigfillset(&set);
+	ret = pthread_sigmask(SIG_BLOCK, &set, oldset);
+	if (ret)
+		abort();
+}
+
+static void signal_restore(sigset_t oldset)
+{
+	int ret;
+
+	ret = pthread_sigmask(SIG_SETMASK, &oldset, NULL);
+	if (ret)
+		abort();
+}
+
+static int sys_rseq(volatile struct rseq *rseq_abi, uint32_t rseq_len,
+		int flags, uint32_t sig)
+{
+	return syscall(__NR_rseq, rseq_abi, rseq_len, flags, sig);
+}
+
+int rseq_register_current_thread(void)
+{
+	int rc, ret = 0;
+	sigset_t oldset;
+
+	signal_off_save(&oldset);
+	if (refcount++)
+		goto end;
+	rc = sys_rseq(&__rseq_abi, sizeof(struct rseq), 0, RSEQ_SIG);
+	if (!rc) {
+		assert(rseq_current_cpu_raw() >= 0);
+		goto end;
+	}
+	if (errno != EBUSY)
+		__rseq_abi.cpu_id = -2;
+	ret = -1;
+	refcount--;
+end:
+	signal_restore(oldset);
+	return ret;
+}
+
+int rseq_unregister_current_thread(void)
+{
+	int rc, ret = 0;
+	sigset_t oldset;
+
+	signal_off_save(&oldset);
+	if (--refcount)
+		goto end;
+	rc = sys_rseq(&__rseq_abi, sizeof(struct rseq),
+			RSEQ_FLAG_UNREGISTER, RSEQ_SIG);
+	if (!rc)
+		goto end;
+	ret = -1;
+end:
+	signal_restore(oldset);
+	return ret;
+}
+
+int32_t rseq_fallback_current_cpu(void)
+{
+	int32_t cpu;
+
+	cpu = sched_getcpu();
+	if (cpu < 0) {
+		perror("sched_getcpu()");
+		abort();
+	}
+	return cpu;
+}
diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h
new file mode 100644
index 000000000000..26c8ea01e940
--- /dev/null
+++ b/tools/testing/selftests/rseq/rseq.h
@@ -0,0 +1,154 @@
+/*
+ * rseq.h
+ *
+ * (C) Copyright 2016 - Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef RSEQ_H
+#define RSEQ_H
+
+#include <stdint.h>
+#include <stdbool.h>
+#include <pthread.h>
+#include <signal.h>
+#include <sched.h>
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sched.h>
+#include <linux/rseq.h>
+
+/*
+ * Empty code injection macros, override when testing.
+ * It is important to consider that the ASM injection macros need to be
+ * fully reentrant (e.g. do not modify the stack).
+ */
+#ifndef RSEQ_INJECT_ASM
+#define RSEQ_INJECT_ASM(n)
+#endif
+
+#ifndef RSEQ_INJECT_C
+#define RSEQ_INJECT_C(n)
+#endif
+
+#ifndef RSEQ_INJECT_INPUT
+#define RSEQ_INJECT_INPUT
+#endif
+
+#ifndef RSEQ_INJECT_CLOBBER
+#define RSEQ_INJECT_CLOBBER
+#endif
+
+#ifndef RSEQ_INJECT_FAILED
+#define RSEQ_INJECT_FAILED
+#endif
+
+extern __thread volatile struct rseq __rseq_abi;
+
+#define rseq_likely(x)		__builtin_expect(!!(x), 1)
+#define rseq_unlikely(x)	__builtin_expect(!!(x), 0)
+#define rseq_barrier()		__asm__ __volatile__("" : : : "memory")
+
+#define RSEQ_ACCESS_ONCE(x)	(*(__volatile__  __typeof__(x) *)&(x))
+#define RSEQ_WRITE_ONCE(x, v)	__extension__ ({ RSEQ_ACCESS_ONCE(x) = (v); })
+#define RSEQ_READ_ONCE(x)	RSEQ_ACCESS_ONCE(x)
+
+#define __rseq_str_1(x)	#x
+#define __rseq_str(x)		__rseq_str_1(x)
+
+#if defined(__x86_64__) || defined(__i386__)
+#include <rseq-x86.h>
+#elif defined(__ARMEL__)
+#include <rseq-arm.h>
+#elif defined(__PPC__)
+#include <rseq-ppc.h>
+#else
+#error unsupported target
+#endif
+
+/*
+ * Register rseq for the current thread. This needs to be called once
+ * by any thread which uses restartable sequences, before they start
+ * using restartable sequences, to ensure restartable sequences
+ * succeed. A restartable sequence executed from a non-registered
+ * thread will always fail.
+ */
+int rseq_register_current_thread(void);
+
+/*
+ * Unregister rseq for current thread.
+ */
+int rseq_unregister_current_thread(void);
+
+/*
+ * Restartable sequence fallback for reading the current CPU number.
+ */
+int32_t rseq_fallback_current_cpu(void);
+
+/*
+ * Values returned can be either the current CPU number, -1 (rseq is
+ * uninitialized), or -2 (rseq initialization has failed).
+ */
+static inline int32_t rseq_current_cpu_raw(void)
+{
+	return RSEQ_ACCESS_ONCE(__rseq_abi.cpu_id);
+}
+
+/*
+ * Returns a possible CPU number, which is typically the current CPU.
+ * The returned CPU number can be used to prepare for an rseq critical
+ * section, which will confirm whether the cpu number is indeed the
+ * current one, and whether rseq is initialized.
+ *
+ * The CPU number returned by rseq_cpu_start should always be validated
+ * by passing it to a rseq asm sequence, or by comparing it to the
+ * return value of rseq_current_cpu_raw() if the rseq asm sequence
+ * does not need to be invoked.
+ */
+static inline uint32_t rseq_cpu_start(void)
+{
+	return RSEQ_ACCESS_ONCE(__rseq_abi.cpu_id_start);
+}
+
+static inline uint32_t rseq_current_cpu(void)
+{
+	int32_t cpu;
+
+	cpu = rseq_current_cpu_raw();
+	if (rseq_unlikely(cpu < 0))
+		cpu = rseq_fallback_current_cpu();
+	return cpu;
+}
+
+/*
+ * rseq_prepare_unload() should be invoked by each thread using rseq_finish*()
+ * at least once between their last rseq_finish*() and library unload of the
+ * library defining the rseq critical section (struct rseq_cs). This also
+ * applies to use of rseq in code generated by JIT: rseq_prepare_unload()
+ * should be invoked at least once by each thread using rseq_finish*() before
+ * reclaim of the memory holding the struct rseq_cs.
+ */
+static inline void rseq_prepare_unload(void)
+{
+	__rseq_abi.rseq_cs = 0;
+}
+
+#endif  /* RSEQ_H_ */
diff --git a/tools/testing/selftests/rseq/run_param_test.sh b/tools/testing/selftests/rseq/run_param_test.sh
new file mode 100755
index 000000000000..c7475a2bef11
--- /dev/null
+++ b/tools/testing/selftests/rseq/run_param_test.sh
@@ -0,0 +1,124 @@
+#!/bin/bash
+
+EXTRA_ARGS=${@}
+
+OLDIFS="$IFS"
+IFS=$'\n'
+TEST_LIST=(
+	"-T s"
+	"-T l"
+	"-T b"
+	"-T b -M"
+	"-T m"
+	"-T m -M"
+	"-T i"
+)
+
+TEST_NAME=(
+	"spinlock"
+	"list"
+	"buffer"
+	"buffer with barrier"
+	"memcpy"
+	"memcpy with barrier"
+	"increment"
+)
+IFS="$OLDIFS"
+
+function do_tests()
+{
+	local i=0
+	while [ "$i" -lt "${#TEST_LIST[@]}" ]; do
+		echo "Running test ${TEST_NAME[$i]}"
+		./param_test ${TEST_LIST[$i]} ${@} ${EXTRA_ARGS} || exit 1
+		let "i++"
+	done
+}
+
+echo "Default parameters"
+do_tests
+
+echo "Loop injection: 10000 loops"
+
+OLDIFS="$IFS"
+IFS=$'\n'
+INJECT_LIST=(
+	"1"
+	"2"
+	"3"
+	"4"
+	"5"
+	"6"
+	"7"
+	"8"
+	"9"
+)
+IFS="$OLDIFS"
+
+NR_LOOPS=10000
+
+i=0
+while [ "$i" -lt "${#INJECT_LIST[@]}" ]; do
+	echo "Injecting at <${INJECT_LIST[$i]}>"
+	do_tests -${INJECT_LIST[i]} ${NR_LOOPS}
+	let "i++"
+done
+NR_LOOPS=
+
+function inject_blocking()
+{
+	OLDIFS="$IFS"
+	IFS=$'\n'
+	INJECT_LIST=(
+		"7"
+		"8"
+		"9"
+	)
+	IFS="$OLDIFS"
+
+	NR_LOOPS=-1
+
+	i=0
+	while [ "$i" -lt "${#INJECT_LIST[@]}" ]; do
+		echo "Injecting at <${INJECT_LIST[$i]}>"
+		do_tests -${INJECT_LIST[i]} -1 ${@}
+		let "i++"
+	done
+	NR_LOOPS=
+}
+
+echo "Yield injection (25%)"
+inject_blocking -m 4 -y -r 100
+
+echo "Yield injection (50%)"
+inject_blocking -m 2 -y -r 100
+
+echo "Yield injection (100%)"
+inject_blocking -m 1 -y -r 100
+
+echo "Kill injection (25%)"
+inject_blocking -m 4 -k -r 100
+
+echo "Kill injection (50%)"
+inject_blocking -m 2 -k -r 100
+
+echo "Kill injection (100%)"
+inject_blocking -m 1 -k -r 100
+
+echo "Sleep injection (1ms, 25%)"
+inject_blocking -m 4 -s 1 -r 100
+
+echo "Sleep injection (1ms, 50%)"
+inject_blocking -m 2 -s 1 -r 100
+
+echo "Sleep injection (1ms, 100%)"
+inject_blocking -m 1 -s 1 -r 100
+
+echo "Disable rseq for 25% threads"
+do_tests -D 4
+
+echo "Disable rseq for 50% threads"
+do_tests -D 2
+
+echo "Disable rseq"
+do_tests -d
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH for 4.15 14/14] Restartable sequences selftests: arm: workaround gcc asm size guess
  2017-11-06 20:56 ` Mathieu Desnoyers
@ 2017-11-06 20:56   ` Mathieu Desnoyers
  -1 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2017-11-06 20:56 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers, Florian Weimer,
	Shuah Khan, linux-kselftest

Fixes assembler errors:
/tmp/cceKwI9a.s: Assembler messages:
/tmp/cceKwI9a.s:849: Error: co-processor offset out of range

with gcc prior to gcc-7. This can trigger if multiple rseq inline asm
are used within the same function.

My best guess on the cause of this issue is that gcc has a hard
time figuring out the actual size of the inline asm, and therefore
does not compute the offsets at which literal values can be
placed from the program counter accurately.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul Turner <pjt@google.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Andrew Hunter <ahh@google.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Michael Kerrisk <mtk.manpages@gmail.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Florian Weimer <fweimer@redhat.com>
CC: Shuah Khan <shuah@kernel.org>
CC: linux-kselftest@vger.kernel.org
CC: linux-api@vger.kernel.org
---
 tools/testing/selftests/rseq/rseq-arm.h | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/tools/testing/selftests/rseq/rseq-arm.h b/tools/testing/selftests/rseq/rseq-arm.h
index d2e9f07d569a..75371e4dfbfb 100644
--- a/tools/testing/selftests/rseq/rseq-arm.h
+++ b/tools/testing/selftests/rseq/rseq-arm.h
@@ -79,12 +79,15 @@ do {									\
 		teardown						\
 		"b %l[" __rseq_str(cmpfail_label) "]\n\t"
 
+#define rseq_workaround_gcc_asm_size_guess()	__asm__ __volatile__("")
+
 static inline __attribute__((always_inline))
 int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
 		int cpu)
 {
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
@@ -115,11 +118,14 @@ int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
 
@@ -129,6 +135,7 @@ int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
 {
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
@@ -164,11 +171,14 @@ int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
 
@@ -177,6 +187,7 @@ int rseq_addv(intptr_t *v, intptr_t count, int cpu)
 {
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
@@ -203,8 +214,10 @@ int rseq_addv(intptr_t *v, intptr_t count, int cpu)
 		  RSEQ_INJECT_CLOBBER
 		: abort
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 }
@@ -216,6 +229,7 @@ int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
 {
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
@@ -253,11 +267,14 @@ int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
 
@@ -268,6 +285,7 @@ int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
 {
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
@@ -306,11 +324,14 @@ int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
 
@@ -321,6 +342,7 @@ int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
 {
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
@@ -359,11 +381,14 @@ int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
 
@@ -376,6 +401,7 @@ int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
 
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		"str %[src], %[rseq_scratch0]\n\t"
@@ -442,11 +468,14 @@ int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
 
@@ -459,6 +488,7 @@ int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
 
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		"str %[src], %[rseq_scratch0]\n\t"
@@ -526,10 +556,13 @@ int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH for 4.15 14/14] Restartable sequences selftests: arm: workaround gcc asm size guess
@ 2017-11-06 20:56   ` Mathieu Desnoyers
  0 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2017-11-06 20:56 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson
  Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Mathieu Desnoyers, Florian Weimer,
	Shuah Khan

Fixes assembler errors:
/tmp/cceKwI9a.s: Assembler messages:
/tmp/cceKwI9a.s:849: Error: co-processor offset out of range

with gcc prior to gcc-7. This can trigger if multiple rseq inline asm
are used within the same function.

My best guess on the cause of this issue is that gcc has a hard
time figuring out the actual size of the inline asm, and therefore
does not compute the offsets at which literal values can be
placed from the program counter accurately.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul Turner <pjt@google.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Andrew Hunter <ahh@google.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Michael Kerrisk <mtk.manpages@gmail.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Florian Weimer <fweimer@redhat.com>
CC: Shuah Khan <shuah@kernel.org>
CC: linux-kselftest@vger.kernel.org
CC: linux-api@vger.kernel.org
---
 tools/testing/selftests/rseq/rseq-arm.h | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/tools/testing/selftests/rseq/rseq-arm.h b/tools/testing/selftests/rseq/rseq-arm.h
index d2e9f07d569a..75371e4dfbfb 100644
--- a/tools/testing/selftests/rseq/rseq-arm.h
+++ b/tools/testing/selftests/rseq/rseq-arm.h
@@ -79,12 +79,15 @@ do {									\
 		teardown						\
 		"b %l[" __rseq_str(cmpfail_label) "]\n\t"
 
+#define rseq_workaround_gcc_asm_size_guess()	__asm__ __volatile__("")
+
 static inline __attribute__((always_inline))
 int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
 		int cpu)
 {
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
@@ -115,11 +118,14 @@ int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
 
@@ -129,6 +135,7 @@ int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
 {
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
@@ -164,11 +171,14 @@ int rseq_cmpnev_storeoffp_load(intptr_t *v, intptr_t expectnot,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
 
@@ -177,6 +187,7 @@ int rseq_addv(intptr_t *v, intptr_t count, int cpu)
 {
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
@@ -203,8 +214,10 @@ int rseq_addv(intptr_t *v, intptr_t count, int cpu)
 		  RSEQ_INJECT_CLOBBER
 		: abort
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 }
@@ -216,6 +229,7 @@ int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
 {
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
@@ -253,11 +267,14 @@ int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
 
@@ -268,6 +285,7 @@ int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
 {
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
@@ -306,11 +324,14 @@ int rseq_cmpeqv_trystorev_storev_release(intptr_t *v, intptr_t expect,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
 
@@ -321,6 +342,7 @@ int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
 {
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		RSEQ_ASM_STORE_RSEQ_CS(1, 3f, rseq_cs)
@@ -359,11 +381,14 @@ int rseq_cmpeqv_cmpeqv_storev(intptr_t *v, intptr_t expect,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
 
@@ -376,6 +401,7 @@ int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
 
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		"str %[src], %[rseq_scratch0]\n\t"
@@ -442,11 +468,14 @@ int rseq_cmpeqv_trymemcpy_storev(intptr_t *v, intptr_t expect,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
 
@@ -459,6 +488,7 @@ int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
 
 	RSEQ_INJECT_C(9)
 
+	rseq_workaround_gcc_asm_size_guess();
 	__asm__ __volatile__ goto (
 		RSEQ_ASM_DEFINE_TABLE(__rseq_table, 0x0, 0x0, 1f, 2f-1f, 4f)
 		"str %[src], %[rseq_scratch0]\n\t"
@@ -526,10 +556,13 @@ int rseq_cmpeqv_trymemcpy_storev_release(intptr_t *v, intptr_t expect,
 		  RSEQ_INJECT_CLOBBER
 		: abort, cmpfail
 	);
+	rseq_workaround_gcc_asm_size_guess();
 	return 0;
 abort:
+	rseq_workaround_gcc_asm_size_guess();
 	RSEQ_INJECT_FAILED
 	return -1;
 cmpfail:
+	rseq_workaround_gcc_asm_size_guess();
 	return 1;
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH for 4.15 10/14] cpu_opv: Wire up powerpc system call
  2017-11-06 20:56   ` Mathieu Desnoyers
@ 2017-11-07  0:37     ` Nicholas Piggin
  -1 siblings, 0 replies; 51+ messages in thread
From: Nicholas Piggin @ 2017-11-07  0:37 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson, Andrew Morton, Johannes Weiner, Vlastimil Babka,
	linux-mm, LKML

On Mon,  6 Nov 2017 15:56:40 -0500
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> diff --git a/arch/powerpc/include/uapi/asm/unistd.h b/arch/powerpc/include/uapi/asm/unistd.h
> index b1980fcd56d5..972a7d68c143 100644
> --- a/arch/powerpc/include/uapi/asm/unistd.h
> +++ b/arch/powerpc/include/uapi/asm/unistd.h
> @@ -396,5 +396,6 @@
>  #define __NR_kexec_file_load	382
>  #define __NR_statx		383
>  #define __NR_rseq		384
> +#define __NR_cpu_opv		385

Sorry for bike shedding, but could we invest a few more keystrokes to
make these names a bit more readable?

Thanks,
Nick

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH for 4.15 10/14] cpu_opv: Wire up powerpc system call
@ 2017-11-07  0:37     ` Nicholas Piggin
  0 siblings, 0 replies; 51+ messages in thread
From: Nicholas Piggin @ 2017-11-07  0:37 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson, Andrew Morton, Johannes Weiner, Vlastimil Babka,
	linux-mm, LKML

On Mon,  6 Nov 2017 15:56:40 -0500
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> diff --git a/arch/powerpc/include/uapi/asm/unistd.h b/arch/powerpc/include/uapi/asm/unistd.h
> index b1980fcd56d5..972a7d68c143 100644
> --- a/arch/powerpc/include/uapi/asm/unistd.h
> +++ b/arch/powerpc/include/uapi/asm/unistd.h
> @@ -396,5 +396,6 @@
>  #define __NR_kexec_file_load	382
>  #define __NR_statx		383
>  #define __NR_rseq		384
> +#define __NR_cpu_opv		385

Sorry for bike shedding, but could we invest a few more keystrokes to
make these names a bit more readable?

Thanks,
Nick

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH for 4.15 10/14] cpu_opv: Wire up powerpc system call
  2017-11-07  0:37     ` Nicholas Piggin
@ 2017-11-07  0:47       ` Mathieu Desnoyers
  -1 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2017-11-07  0:47 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Peter Zijlstra, Paul E. McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson, Andrew Morton, Johannes Weiner, Vlastimil Babka,
	linux-mm, linux-kernel

----- On Nov 6, 2017, at 7:37 PM, Nicholas Piggin npiggin@gmail.com wrote:

> On Mon,  6 Nov 2017 15:56:40 -0500
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
>> diff --git a/arch/powerpc/include/uapi/asm/unistd.h
>> b/arch/powerpc/include/uapi/asm/unistd.h
>> index b1980fcd56d5..972a7d68c143 100644
>> --- a/arch/powerpc/include/uapi/asm/unistd.h
>> +++ b/arch/powerpc/include/uapi/asm/unistd.h
>> @@ -396,5 +396,6 @@
>>  #define __NR_kexec_file_load	382
>>  #define __NR_statx		383
>>  #define __NR_rseq		384
>> +#define __NR_cpu_opv		385
> 
> Sorry for bike shedding, but could we invest a few more keystrokes to
> make these names a bit more readable?

Whenever I try to make variables or function names more explicit, I can
literally feel my consciousness (taking the form of an angry Peter Zijlstra)
breathing down my neck asking me to make them shorter. So I guess this is
where it becomes a question of taste.

I think the "rseq" syscall name is short, to the point, and should be mostly
fine.

For "cpu_opv", it was just a short name that fit the bill until a
better idea would come.

I'm open to suggestions. Any color preference ? ;-)

Thanks,

Mathieu


> 
> Thanks,
> Nick

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH for 4.15 10/14] cpu_opv: Wire up powerpc system call
@ 2017-11-07  0:47       ` Mathieu Desnoyers
  0 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2017-11-07  0:47 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Peter Zijlstra, Paul E. McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson, Andrew Morton, Johannes Weiner, Vlastimil Babka,
	linux-mm, linux-kernel

----- On Nov 6, 2017, at 7:37 PM, Nicholas Piggin npiggin@gmail.com wrote:

> On Mon,  6 Nov 2017 15:56:40 -0500
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
>> diff --git a/arch/powerpc/include/uapi/asm/unistd.h
>> b/arch/powerpc/include/uapi/asm/unistd.h
>> index b1980fcd56d5..972a7d68c143 100644
>> --- a/arch/powerpc/include/uapi/asm/unistd.h
>> +++ b/arch/powerpc/include/uapi/asm/unistd.h
>> @@ -396,5 +396,6 @@
>>  #define __NR_kexec_file_load	382
>>  #define __NR_statx		383
>>  #define __NR_rseq		384
>> +#define __NR_cpu_opv		385
> 
> Sorry for bike shedding, but could we invest a few more keystrokes to
> make these names a bit more readable?

Whenever I try to make variables or function names more explicit, I can
literally feel my consciousness (taking the form of an angry Peter Zijlstra)
breathing down my neck asking me to make them shorter. So I guess this is
where it becomes a question of taste.

I think the "rseq" syscall name is short, to the point, and should be mostly
fine.

For "cpu_opv", it was just a short name that fit the bill until a
better idea would come.

I'm open to suggestions. Any color preference ? ;-)

Thanks,

Mathieu


> 
> Thanks,
> Nick

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH for 4.15 10/14] cpu_opv: Wire up powerpc system call
  2017-11-07  0:47       ` Mathieu Desnoyers
@ 2017-11-07  1:21         ` Nicholas Piggin
  -1 siblings, 0 replies; 51+ messages in thread
From: Nicholas Piggin @ 2017-11-07  1:21 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, Paul E. McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson, Andrew Morton, Johannes Weiner, Vlastimil Babka,
	linux-mm, linux-kernel

On Tue, 7 Nov 2017 00:47:17 +0000 (UTC)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> ----- On Nov 6, 2017, at 7:37 PM, Nicholas Piggin npiggin@gmail.com wrote:
> 
> > On Mon,  6 Nov 2017 15:56:40 -0500
> > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> >   
> >> diff --git a/arch/powerpc/include/uapi/asm/unistd.h
> >> b/arch/powerpc/include/uapi/asm/unistd.h
> >> index b1980fcd56d5..972a7d68c143 100644
> >> --- a/arch/powerpc/include/uapi/asm/unistd.h
> >> +++ b/arch/powerpc/include/uapi/asm/unistd.h
> >> @@ -396,5 +396,6 @@
> >>  #define __NR_kexec_file_load	382
> >>  #define __NR_statx		383
> >>  #define __NR_rseq		384
> >> +#define __NR_cpu_opv		385  
> > 
> > Sorry for bike shedding, but could we invest a few more keystrokes to
> > make these names a bit more readable?  
> 
> Whenever I try to make variables or function names more explicit, I can
> literally feel my consciousness (taking the form of an angry Peter Zijlstra)
> breathing down my neck asking me to make them shorter. So I guess this is
> where it becomes a question of taste.

Specialist syscall is a bit different than a common function or variable
though.

> 
> I think the "rseq" syscall name is short, to the point, and should be mostly
> fine.

I'm not sure if it's really "to the point". I think kexec_file_load
is better than kfload, for example :)

> For "cpu_opv", it was just a short name that fit the bill until a
> better idea would come.
> 
> I'm open to suggestions. Any color preference ? ;-)

What can you do within 16 characters?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH for 4.15 10/14] cpu_opv: Wire up powerpc system call
@ 2017-11-07  1:21         ` Nicholas Piggin
  0 siblings, 0 replies; 51+ messages in thread
From: Nicholas Piggin @ 2017-11-07  1:21 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, Paul E. McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson, Andrew Morton, Johannes Weiner, Vlastimil Babka,
	linux-mm, linux-kernel

On Tue, 7 Nov 2017 00:47:17 +0000 (UTC)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> ----- On Nov 6, 2017, at 7:37 PM, Nicholas Piggin npiggin@gmail.com wrote:
> 
> > On Mon,  6 Nov 2017 15:56:40 -0500
> > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> >   
> >> diff --git a/arch/powerpc/include/uapi/asm/unistd.h
> >> b/arch/powerpc/include/uapi/asm/unistd.h
> >> index b1980fcd56d5..972a7d68c143 100644
> >> --- a/arch/powerpc/include/uapi/asm/unistd.h
> >> +++ b/arch/powerpc/include/uapi/asm/unistd.h
> >> @@ -396,5 +396,6 @@
> >>  #define __NR_kexec_file_load	382
> >>  #define __NR_statx		383
> >>  #define __NR_rseq		384
> >> +#define __NR_cpu_opv		385  
> > 
> > Sorry for bike shedding, but could we invest a few more keystrokes to
> > make these names a bit more readable?  
> 
> Whenever I try to make variables or function names more explicit, I can
> literally feel my consciousness (taking the form of an angry Peter Zijlstra)
> breathing down my neck asking me to make them shorter. So I guess this is
> where it becomes a question of taste.

Specialist syscall is a bit different than a common function or variable
though.

> 
> I think the "rseq" syscall name is short, to the point, and should be mostly
> fine.

I'm not sure if it's really "to the point". I think kexec_file_load
is better than kfload, for example :)

> For "cpu_opv", it was just a short name that fit the bill until a
> better idea would come.
> 
> I'm open to suggestions. Any color preference ? ;-)

What can you do within 16 characters?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v10 for 4.15 01/14] Restartable sequences system call
  2017-11-06 20:56 ` [RFC PATCH v10 for 4.15 01/14] Restartable sequences system call Mathieu Desnoyers
@ 2017-11-07  1:24     ` Boqun Feng
  0 siblings, 0 replies; 51+ messages in thread
From: Boqun Feng @ 2017-11-07  1:24 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, Paul E . McKenney, Andy Lutomirski, Dave Watson,
	linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Alexander Viro

[-- Attachment #1: Type: text/plain, Size: 3543 bytes --]

On Mon, Nov 06, 2017 at 03:56:31PM -0500, Mathieu Desnoyers wrote:
[...]
> +
> +/*
> + * struct rseq is aligned on 4 * 8 bytes to ensure it is always
> + * contained within a single cache-line.
> + *
> + * A single struct rseq per thread is allowed.
> + */
> +struct rseq {
> +	/*
> +	 * Restartable sequences cpu_id_start field. Updated by the
> +	 * kernel, and read by user-space with single-copy atomicity
> +	 * semantics. Aligned on 32-bit. Always contain a value in the
> +	 * range of possible CPUs, although the value may not be the
> +	 * actual current CPU (e.g. if rseq is not initialized). This
> +	 * CPU number value should always be confirmed against the value
> +	 * of the cpu_id field.
> +	 */
> +	uint32_t cpu_id_start;
> +	/*
> +	 * Restartable sequences cpu_id field. Updated by the kernel,
> +	 * and read by user-space with single-copy atomicity semantics.
> +	 * Aligned on 32-bit. Values -1U and -2U have a special
> +	 * semantic: -1U means "rseq uninitialized", and -2U means "rseq
> +	 * initialization failed".
> +	 */
> +	uint32_t cpu_id;
> +	/*
> +	 * Restartable sequences rseq_cs field.
> +	 *
> +	 * Contains NULL when no critical section is active for the current
> +	 * thread, or holds a pointer to the currently active struct rseq_cs.
> +	 *
> +	 * Updated by user-space at the beginning of assembly instruction
> +	 * sequence block, and by the kernel when it restarts an assembly
> +	 * instruction sequence block, and when the kernel detects that it
> +	 * is preempting or delivering a signal outside of the range
> +	 * targeted by the rseq_cs. Also needs to be cleared by user-space
> +	 * before reclaiming memory that contains the targeted struct
> +	 * rseq_cs.
> +	 *
> +	 * Read and set by the kernel with single-copy atomicity semantics.
> +	 * Aligned on 64-bit.
> +	 */
> +	RSEQ_FIELD_u32_u64(rseq_cs);
> +	/*
> +	 * - RSEQ_DISABLE flag:
> +	 *
> +	 * Fallback fast-track flag for single-stepping.
> +	 * Set by user-space if lack of progress is detected.
> +	 * Cleared by user-space after rseq finish.
> +	 * Read by the kernel.
> +	 * - RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT
> +	 *     Inhibit instruction sequence block restart and event
> +	 *     counter increment on preemption for this thread.

Nit: "event counter" has been removed entirely ;-)

> +	 * - RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL
> +	 *     Inhibit instruction sequence block restart and event
> +	 *     counter increment on signal delivery for this thread.
> +	 * - RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE
> +	 *     Inhibit instruction sequence block restart and event
> +	 *     counter increment on migration for this thread.
> +	 */
> +	uint32_t flags;
> +} __attribute__((aligned(4 * sizeof(uint64_t))));
> +
> +#endif /* _UAPI_LINUX_RSEQ_H */
[...]
> +	} else {
> +		/*
> +		 * If there was no rseq previously registered,
> +		 * we need to ensure the provided rseq is
> +		 * properly aligned and valid.
> +		 */
> +		if (!IS_ALIGNED((unsigned long)rseq, __alignof__(*rseq))
> +				|| rseq_len != sizeof(*rseq))
> +			return -EINVAL;
> +		if (!access_ok(VERIFY_WRITE, rseq, rseq_len))
> +			return -EFAULT;
> +		current->rseq = rseq;
> +		current->rseq_len = rseq_len;
> +		current->rseq_sig = sig;
> +		/*
> +		 * If rseq was previously inactive, and has just
> +		 * been registered, ensure the cpu_id and
> +		 * event_counter fields are updated before

s/event_counter/cpu_start_id/ ?

Regards,
Boqun

> +		 * returning to user-space.
> +		 */
> +		rseq_set_notify_resume(current);
> +	}
> +
> +	return 0;
> +}
[...]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v10 for 4.15 01/14] Restartable sequences system call
@ 2017-11-07  1:24     ` Boqun Feng
  0 siblings, 0 replies; 51+ messages in thread
From: Boqun Feng @ 2017-11-07  1:24 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, Paul E . McKenney, Andy Lutomirski, Dave Watson,
	linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon

[-- Attachment #1: Type: text/plain, Size: 3543 bytes --]

On Mon, Nov 06, 2017 at 03:56:31PM -0500, Mathieu Desnoyers wrote:
[...]
> +
> +/*
> + * struct rseq is aligned on 4 * 8 bytes to ensure it is always
> + * contained within a single cache-line.
> + *
> + * A single struct rseq per thread is allowed.
> + */
> +struct rseq {
> +	/*
> +	 * Restartable sequences cpu_id_start field. Updated by the
> +	 * kernel, and read by user-space with single-copy atomicity
> +	 * semantics. Aligned on 32-bit. Always contain a value in the
> +	 * range of possible CPUs, although the value may not be the
> +	 * actual current CPU (e.g. if rseq is not initialized). This
> +	 * CPU number value should always be confirmed against the value
> +	 * of the cpu_id field.
> +	 */
> +	uint32_t cpu_id_start;
> +	/*
> +	 * Restartable sequences cpu_id field. Updated by the kernel,
> +	 * and read by user-space with single-copy atomicity semantics.
> +	 * Aligned on 32-bit. Values -1U and -2U have a special
> +	 * semantic: -1U means "rseq uninitialized", and -2U means "rseq
> +	 * initialization failed".
> +	 */
> +	uint32_t cpu_id;
> +	/*
> +	 * Restartable sequences rseq_cs field.
> +	 *
> +	 * Contains NULL when no critical section is active for the current
> +	 * thread, or holds a pointer to the currently active struct rseq_cs.
> +	 *
> +	 * Updated by user-space at the beginning of assembly instruction
> +	 * sequence block, and by the kernel when it restarts an assembly
> +	 * instruction sequence block, and when the kernel detects that it
> +	 * is preempting or delivering a signal outside of the range
> +	 * targeted by the rseq_cs. Also needs to be cleared by user-space
> +	 * before reclaiming memory that contains the targeted struct
> +	 * rseq_cs.
> +	 *
> +	 * Read and set by the kernel with single-copy atomicity semantics.
> +	 * Aligned on 64-bit.
> +	 */
> +	RSEQ_FIELD_u32_u64(rseq_cs);
> +	/*
> +	 * - RSEQ_DISABLE flag:
> +	 *
> +	 * Fallback fast-track flag for single-stepping.
> +	 * Set by user-space if lack of progress is detected.
> +	 * Cleared by user-space after rseq finish.
> +	 * Read by the kernel.
> +	 * - RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT
> +	 *     Inhibit instruction sequence block restart and event
> +	 *     counter increment on preemption for this thread.

Nit: "event counter" has been removed entirely ;-)

> +	 * - RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL
> +	 *     Inhibit instruction sequence block restart and event
> +	 *     counter increment on signal delivery for this thread.
> +	 * - RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE
> +	 *     Inhibit instruction sequence block restart and event
> +	 *     counter increment on migration for this thread.
> +	 */
> +	uint32_t flags;
> +} __attribute__((aligned(4 * sizeof(uint64_t))));
> +
> +#endif /* _UAPI_LINUX_RSEQ_H */
[...]
> +	} else {
> +		/*
> +		 * If there was no rseq previously registered,
> +		 * we need to ensure the provided rseq is
> +		 * properly aligned and valid.
> +		 */
> +		if (!IS_ALIGNED((unsigned long)rseq, __alignof__(*rseq))
> +				|| rseq_len != sizeof(*rseq))
> +			return -EINVAL;
> +		if (!access_ok(VERIFY_WRITE, rseq, rseq_len))
> +			return -EFAULT;
> +		current->rseq = rseq;
> +		current->rseq_len = rseq_len;
> +		current->rseq_sig = sig;
> +		/*
> +		 * If rseq was previously inactive, and has just
> +		 * been registered, ensure the cpu_id and
> +		 * event_counter fields are updated before

s/event_counter/cpu_start_id/ ?

Regards,
Boqun

> +		 * returning to user-space.
> +		 */
> +		rseq_set_notify_resume(current);
> +	}
> +
> +	return 0;
> +}
[...]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v2 for 4.15 08/14] Provide cpu_opv system call
@ 2017-11-07  2:07     ` Boqun Feng
  0 siblings, 0 replies; 51+ messages in thread
From: Boqun Feng @ 2017-11-07  2:07 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, Paul E . McKenney, Andy Lutomirski, Dave Watson,
	linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk

[-- Attachment #1: Type: text/plain, Size: 5334 bytes --]

On Mon, Nov 06, 2017 at 03:56:38PM -0500, Mathieu Desnoyers wrote:
[...]
> +static int cpu_op_pin_pages(unsigned long addr, unsigned long len,
> +		struct page ***pinned_pages_ptr, size_t *nr_pinned,
> +		int write)
> +{
> +	struct page *pages[2];
> +	int ret, nr_pages;
> +
> +	if (!len)
> +		return 0;
> +	nr_pages = cpu_op_range_nr_pages(addr, len);
> +	BUG_ON(nr_pages > 2);
> +	if (*nr_pinned + nr_pages > NR_PINNED_PAGES_ON_STACK) {

Is this a bug? Seems you will kzalloc() every time if *nr_pinned is
bigger than NR_PINNED_PAGES_ON_STACK, which will result in memory
leaking.

I think the logic here is complex enough for us to introduce a
structure, like:

	struct cpu_opv_page_pinner {
		int nr_pinned;
		bool is_kmalloc;
		struct page **pinned_pages;
	};

Thoughts?

Regards,
Boqun

> +		struct page **pinned_pages =
> +			kzalloc(CPU_OP_VEC_LEN_MAX * CPU_OP_MAX_PAGES
> +				* sizeof(struct page *), GFP_KERNEL);
> +		if (!pinned_pages)
> +			return -ENOMEM;
> +		memcpy(pinned_pages, *pinned_pages_ptr,
> +			*nr_pinned * sizeof(struct page *));
> +		*pinned_pages_ptr = pinned_pages;
> +	}
> +again:
> +	ret = get_user_pages_fast(addr, nr_pages, write, pages);
> +	if (ret < nr_pages) {
> +		if (ret > 0)
> +			put_page(pages[0]);
> +		return -EFAULT;
> +	}
> +	/*
> +	 * Refuse device pages, the zero page, pages in the gate area,
> +	 * and special mappings.
> +	 */
> +	ret = cpu_op_check_pages(pages, nr_pages);
> +	if (ret == -EAGAIN) {
> +		put_page(pages[0]);
> +		if (nr_pages > 1)
> +			put_page(pages[1]);
> +		goto again;
> +	}
> +	if (ret)
> +		goto error;
> +	(*pinned_pages_ptr)[(*nr_pinned)++] = pages[0];
> +	if (nr_pages > 1)
> +		(*pinned_pages_ptr)[(*nr_pinned)++] = pages[1];
> +	return 0;
> +
> +error:
> +	put_page(pages[0]);
> +	if (nr_pages > 1)
> +		put_page(pages[1]);
> +	return -EFAULT;
> +}
> +
> +static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
> +		struct page ***pinned_pages_ptr, size_t *nr_pinned)
> +{
> +	int ret, i;
> +	bool expect_fault = false;
> +
> +	/* Check access, pin pages. */
> +	for (i = 0; i < cpuopcnt; i++) {
> +		struct cpu_op *op = &cpuop[i];
> +
> +		switch (op->op) {
> +		case CPU_COMPARE_EQ_OP:
> +		case CPU_COMPARE_NE_OP:
> +			ret = -EFAULT;
> +			expect_fault = op->u.compare_op.expect_fault_a;
> +			if (!access_ok(VERIFY_READ, op->u.compare_op.a,
> +					op->len))
> +				goto error;
> +			ret = cpu_op_pin_pages(
> +					(unsigned long)op->u.compare_op.a,
> +					op->len, pinned_pages_ptr, nr_pinned, 0);
> +			if (ret)
> +				goto error;
> +			ret = -EFAULT;
> +			expect_fault = op->u.compare_op.expect_fault_b;
> +			if (!access_ok(VERIFY_READ, op->u.compare_op.b,
> +					op->len))
> +				goto error;
> +			ret = cpu_op_pin_pages(
> +					(unsigned long)op->u.compare_op.b,
> +					op->len, pinned_pages_ptr, nr_pinned, 0);
> +			if (ret)
> +				goto error;
> +			break;
> +		case CPU_MEMCPY_OP:
> +			ret = -EFAULT;
> +			expect_fault = op->u.memcpy_op.expect_fault_dst;
> +			if (!access_ok(VERIFY_WRITE, op->u.memcpy_op.dst,
> +					op->len))
> +				goto error;
> +			ret = cpu_op_pin_pages(
> +					(unsigned long)op->u.memcpy_op.dst,
> +					op->len, pinned_pages_ptr, nr_pinned, 1);
> +			if (ret)
> +				goto error;
> +			ret = -EFAULT;
> +			expect_fault = op->u.memcpy_op.expect_fault_src;
> +			if (!access_ok(VERIFY_READ, op->u.memcpy_op.src,
> +					op->len))
> +				goto error;
> +			ret = cpu_op_pin_pages(
> +					(unsigned long)op->u.memcpy_op.src,
> +					op->len, pinned_pages_ptr, nr_pinned, 0);
> +			if (ret)
> +				goto error;
> +			break;
> +		case CPU_ADD_OP:
> +			ret = -EFAULT;
> +			expect_fault = op->u.arithmetic_op.expect_fault_p;
> +			if (!access_ok(VERIFY_WRITE, op->u.arithmetic_op.p,
> +					op->len))
> +				goto error;
> +			ret = cpu_op_pin_pages(
> +					(unsigned long)op->u.arithmetic_op.p,
> +					op->len, pinned_pages_ptr, nr_pinned, 1);
> +			if (ret)
> +				goto error;
> +			break;
> +		case CPU_OR_OP:
> +		case CPU_AND_OP:
> +		case CPU_XOR_OP:
> +			ret = -EFAULT;
> +			expect_fault = op->u.bitwise_op.expect_fault_p;
> +			if (!access_ok(VERIFY_WRITE, op->u.bitwise_op.p,
> +					op->len))
> +				goto error;
> +			ret = cpu_op_pin_pages(
> +					(unsigned long)op->u.bitwise_op.p,
> +					op->len, pinned_pages_ptr, nr_pinned, 1);
> +			if (ret)
> +				goto error;
> +			break;
> +		case CPU_LSHIFT_OP:
> +		case CPU_RSHIFT_OP:
> +			ret = -EFAULT;
> +			expect_fault = op->u.shift_op.expect_fault_p;
> +			if (!access_ok(VERIFY_WRITE, op->u.shift_op.p,
> +					op->len))
> +				goto error;
> +			ret = cpu_op_pin_pages(
> +					(unsigned long)op->u.shift_op.p,
> +					op->len, pinned_pages_ptr, nr_pinned, 1);
> +			if (ret)
> +				goto error;
> +			break;
> +		case CPU_MB_OP:
> +			break;
> +		default:
> +			return -EINVAL;
> +		}
> +	}
> +	return 0;
> +
> +error:
> +	for (i = 0; i < *nr_pinned; i++)
> +		put_page((*pinned_pages_ptr)[i]);
> +	*nr_pinned = 0;
> +	/*
> +	 * If faulting access is expected, return EAGAIN to user-space.
> +	 * It allows user-space to distinguish between a fault caused by
> +	 * an access which is expect to fault (e.g. due to concurrent
> +	 * unmapping of underlying memory) from an unexpected fault from
> +	 * which a retry would not recover.
> +	 */
> +	if (ret == -EFAULT && expect_fault)
> +		return -EAGAIN;
> +	return ret;
> +}
[...]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v2 for 4.15 08/14] Provide cpu_opv system call
@ 2017-11-07  2:07     ` Boqun Feng
  0 siblings, 0 replies; 51+ messages in thread
From: Boqun Feng @ 2017-11-07  2:07 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, Paul E . McKenney, Andy Lutomirski, Dave Watson,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon

[-- Attachment #1: Type: text/plain, Size: 5334 bytes --]

On Mon, Nov 06, 2017 at 03:56:38PM -0500, Mathieu Desnoyers wrote:
[...]
> +static int cpu_op_pin_pages(unsigned long addr, unsigned long len,
> +		struct page ***pinned_pages_ptr, size_t *nr_pinned,
> +		int write)
> +{
> +	struct page *pages[2];
> +	int ret, nr_pages;
> +
> +	if (!len)
> +		return 0;
> +	nr_pages = cpu_op_range_nr_pages(addr, len);
> +	BUG_ON(nr_pages > 2);
> +	if (*nr_pinned + nr_pages > NR_PINNED_PAGES_ON_STACK) {

Is this a bug? Seems you will kzalloc() every time if *nr_pinned is
bigger than NR_PINNED_PAGES_ON_STACK, which will result in memory
leaking.

I think the logic here is complex enough for us to introduce a
structure, like:

	struct cpu_opv_page_pinner {
		int nr_pinned;
		bool is_kmalloc;
		struct page **pinned_pages;
	};

Thoughts?

Regards,
Boqun

> +		struct page **pinned_pages =
> +			kzalloc(CPU_OP_VEC_LEN_MAX * CPU_OP_MAX_PAGES
> +				* sizeof(struct page *), GFP_KERNEL);
> +		if (!pinned_pages)
> +			return -ENOMEM;
> +		memcpy(pinned_pages, *pinned_pages_ptr,
> +			*nr_pinned * sizeof(struct page *));
> +		*pinned_pages_ptr = pinned_pages;
> +	}
> +again:
> +	ret = get_user_pages_fast(addr, nr_pages, write, pages);
> +	if (ret < nr_pages) {
> +		if (ret > 0)
> +			put_page(pages[0]);
> +		return -EFAULT;
> +	}
> +	/*
> +	 * Refuse device pages, the zero page, pages in the gate area,
> +	 * and special mappings.
> +	 */
> +	ret = cpu_op_check_pages(pages, nr_pages);
> +	if (ret == -EAGAIN) {
> +		put_page(pages[0]);
> +		if (nr_pages > 1)
> +			put_page(pages[1]);
> +		goto again;
> +	}
> +	if (ret)
> +		goto error;
> +	(*pinned_pages_ptr)[(*nr_pinned)++] = pages[0];
> +	if (nr_pages > 1)
> +		(*pinned_pages_ptr)[(*nr_pinned)++] = pages[1];
> +	return 0;
> +
> +error:
> +	put_page(pages[0]);
> +	if (nr_pages > 1)
> +		put_page(pages[1]);
> +	return -EFAULT;
> +}
> +
> +static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
> +		struct page ***pinned_pages_ptr, size_t *nr_pinned)
> +{
> +	int ret, i;
> +	bool expect_fault = false;
> +
> +	/* Check access, pin pages. */
> +	for (i = 0; i < cpuopcnt; i++) {
> +		struct cpu_op *op = &cpuop[i];
> +
> +		switch (op->op) {
> +		case CPU_COMPARE_EQ_OP:
> +		case CPU_COMPARE_NE_OP:
> +			ret = -EFAULT;
> +			expect_fault = op->u.compare_op.expect_fault_a;
> +			if (!access_ok(VERIFY_READ, op->u.compare_op.a,
> +					op->len))
> +				goto error;
> +			ret = cpu_op_pin_pages(
> +					(unsigned long)op->u.compare_op.a,
> +					op->len, pinned_pages_ptr, nr_pinned, 0);
> +			if (ret)
> +				goto error;
> +			ret = -EFAULT;
> +			expect_fault = op->u.compare_op.expect_fault_b;
> +			if (!access_ok(VERIFY_READ, op->u.compare_op.b,
> +					op->len))
> +				goto error;
> +			ret = cpu_op_pin_pages(
> +					(unsigned long)op->u.compare_op.b,
> +					op->len, pinned_pages_ptr, nr_pinned, 0);
> +			if (ret)
> +				goto error;
> +			break;
> +		case CPU_MEMCPY_OP:
> +			ret = -EFAULT;
> +			expect_fault = op->u.memcpy_op.expect_fault_dst;
> +			if (!access_ok(VERIFY_WRITE, op->u.memcpy_op.dst,
> +					op->len))
> +				goto error;
> +			ret = cpu_op_pin_pages(
> +					(unsigned long)op->u.memcpy_op.dst,
> +					op->len, pinned_pages_ptr, nr_pinned, 1);
> +			if (ret)
> +				goto error;
> +			ret = -EFAULT;
> +			expect_fault = op->u.memcpy_op.expect_fault_src;
> +			if (!access_ok(VERIFY_READ, op->u.memcpy_op.src,
> +					op->len))
> +				goto error;
> +			ret = cpu_op_pin_pages(
> +					(unsigned long)op->u.memcpy_op.src,
> +					op->len, pinned_pages_ptr, nr_pinned, 0);
> +			if (ret)
> +				goto error;
> +			break;
> +		case CPU_ADD_OP:
> +			ret = -EFAULT;
> +			expect_fault = op->u.arithmetic_op.expect_fault_p;
> +			if (!access_ok(VERIFY_WRITE, op->u.arithmetic_op.p,
> +					op->len))
> +				goto error;
> +			ret = cpu_op_pin_pages(
> +					(unsigned long)op->u.arithmetic_op.p,
> +					op->len, pinned_pages_ptr, nr_pinned, 1);
> +			if (ret)
> +				goto error;
> +			break;
> +		case CPU_OR_OP:
> +		case CPU_AND_OP:
> +		case CPU_XOR_OP:
> +			ret = -EFAULT;
> +			expect_fault = op->u.bitwise_op.expect_fault_p;
> +			if (!access_ok(VERIFY_WRITE, op->u.bitwise_op.p,
> +					op->len))
> +				goto error;
> +			ret = cpu_op_pin_pages(
> +					(unsigned long)op->u.bitwise_op.p,
> +					op->len, pinned_pages_ptr, nr_pinned, 1);
> +			if (ret)
> +				goto error;
> +			break;
> +		case CPU_LSHIFT_OP:
> +		case CPU_RSHIFT_OP:
> +			ret = -EFAULT;
> +			expect_fault = op->u.shift_op.expect_fault_p;
> +			if (!access_ok(VERIFY_WRITE, op->u.shift_op.p,
> +					op->len))
> +				goto error;
> +			ret = cpu_op_pin_pages(
> +					(unsigned long)op->u.shift_op.p,
> +					op->len, pinned_pages_ptr, nr_pinned, 1);
> +			if (ret)
> +				goto error;
> +			break;
> +		case CPU_MB_OP:
> +			break;
> +		default:
> +			return -EINVAL;
> +		}
> +	}
> +	return 0;
> +
> +error:
> +	for (i = 0; i < *nr_pinned; i++)
> +		put_page((*pinned_pages_ptr)[i]);
> +	*nr_pinned = 0;
> +	/*
> +	 * If faulting access is expected, return EAGAIN to user-space.
> +	 * It allows user-space to distinguish between a fault caused by
> +	 * an access which is expect to fault (e.g. due to concurrent
> +	 * unmapping of underlying memory) from an unexpected fault from
> +	 * which a retry would not recover.
> +	 */
> +	if (ret == -EFAULT && expect_fault)
> +		return -EAGAIN;
> +	return ret;
> +}
[...]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v10 for 4.15 01/14] Restartable sequences system call
  2017-11-07  1:24     ` Boqun Feng
@ 2017-11-07  2:20       ` Mathieu Desnoyers
  -1 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2017-11-07  2:20 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Peter Zijlstra, Paul E. McKenney, Andy Lutomirski, Dave Watson,
	linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon,
	Michael Kerrisk, Alexander Viro

----- On Nov 6, 2017, at 8:24 PM, Boqun Feng boqun.feng@gmail.com wrote:

> On Mon, Nov 06, 2017 at 03:56:31PM -0500, Mathieu Desnoyers wrote:
> [...]
>> +
>> +/*
>> + * struct rseq is aligned on 4 * 8 bytes to ensure it is always
>> + * contained within a single cache-line.
>> + *
>> + * A single struct rseq per thread is allowed.
>> + */
>> +struct rseq {
>> +	/*
>> +	 * Restartable sequences cpu_id_start field. Updated by the
>> +	 * kernel, and read by user-space with single-copy atomicity
>> +	 * semantics. Aligned on 32-bit. Always contain a value in the
>> +	 * range of possible CPUs, although the value may not be the
>> +	 * actual current CPU (e.g. if rseq is not initialized). This
>> +	 * CPU number value should always be confirmed against the value
>> +	 * of the cpu_id field.
>> +	 */
>> +	uint32_t cpu_id_start;
>> +	/*
>> +	 * Restartable sequences cpu_id field. Updated by the kernel,
>> +	 * and read by user-space with single-copy atomicity semantics.
>> +	 * Aligned on 32-bit. Values -1U and -2U have a special
>> +	 * semantic: -1U means "rseq uninitialized", and -2U means "rseq
>> +	 * initialization failed".
>> +	 */
>> +	uint32_t cpu_id;
>> +	/*
>> +	 * Restartable sequences rseq_cs field.
>> +	 *
>> +	 * Contains NULL when no critical section is active for the current
>> +	 * thread, or holds a pointer to the currently active struct rseq_cs.
>> +	 *
>> +	 * Updated by user-space at the beginning of assembly instruction
>> +	 * sequence block, and by the kernel when it restarts an assembly
>> +	 * instruction sequence block, and when the kernel detects that it
>> +	 * is preempting or delivering a signal outside of the range
>> +	 * targeted by the rseq_cs. Also needs to be cleared by user-space
>> +	 * before reclaiming memory that contains the targeted struct
>> +	 * rseq_cs.
>> +	 *
>> +	 * Read and set by the kernel with single-copy atomicity semantics.
>> +	 * Aligned on 64-bit.
>> +	 */
>> +	RSEQ_FIELD_u32_u64(rseq_cs);
>> +	/*
>> +	 * - RSEQ_DISABLE flag:
>> +	 *
>> +	 * Fallback fast-track flag for single-stepping.
>> +	 * Set by user-space if lack of progress is detected.
>> +	 * Cleared by user-space after rseq finish.
>> +	 * Read by the kernel.
>> +	 * - RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT
>> +	 *     Inhibit instruction sequence block restart and event
>> +	 *     counter increment on preemption for this thread.
> 
> Nit: "event counter" has been removed entirely ;-)
> 
>> +	 * - RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL
>> +	 *     Inhibit instruction sequence block restart and event
>> +	 *     counter increment on signal delivery for this thread.
>> +	 * - RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE
>> +	 *     Inhibit instruction sequence block restart and event
>> +	 *     counter increment on migration for this thread.
>> +	 */
>> +	uint32_t flags;
>> +} __attribute__((aligned(4 * sizeof(uint64_t))));
>> +
>> +#endif /* _UAPI_LINUX_RSEQ_H */
> [...]
>> +	} else {
>> +		/*
>> +		 * If there was no rseq previously registered,
>> +		 * we need to ensure the provided rseq is
>> +		 * properly aligned and valid.
>> +		 */
>> +		if (!IS_ALIGNED((unsigned long)rseq, __alignof__(*rseq))
>> +				|| rseq_len != sizeof(*rseq))
>> +			return -EINVAL;
>> +		if (!access_ok(VERIFY_WRITE, rseq, rseq_len))
>> +			return -EFAULT;
>> +		current->rseq = rseq;
>> +		current->rseq_len = rseq_len;
>> +		current->rseq_sig = sig;
>> +		/*
>> +		 * If rseq was previously inactive, and has just
>> +		 * been registered, ensure the cpu_id and
>> +		 * event_counter fields are updated before
> 
> s/event_counter/cpu_start_id/ ?

I guess you mean "cpu_id_start".

Good point. v11 will include that fix. Meanwhile, it is available
at https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/log/?h=rseq/dev

Thanks,

Mathieu

> 
> Regards,
> Boqun
> 
>> +		 * returning to user-space.
>> +		 */
>> +		rseq_set_notify_resume(current);
>> +	}
>> +
>> +	return 0;
>> +}
> [...]

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v10 for 4.15 01/14] Restartable sequences system call
@ 2017-11-07  2:20       ` Mathieu Desnoyers
  0 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2017-11-07  2:20 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Peter Zijlstra, Paul E. McKenney, Andy Lutomirski, Dave Watson,
	linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas

----- On Nov 6, 2017, at 8:24 PM, Boqun Feng boqun.feng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org wrote:

> On Mon, Nov 06, 2017 at 03:56:31PM -0500, Mathieu Desnoyers wrote:
> [...]
>> +
>> +/*
>> + * struct rseq is aligned on 4 * 8 bytes to ensure it is always
>> + * contained within a single cache-line.
>> + *
>> + * A single struct rseq per thread is allowed.
>> + */
>> +struct rseq {
>> +	/*
>> +	 * Restartable sequences cpu_id_start field. Updated by the
>> +	 * kernel, and read by user-space with single-copy atomicity
>> +	 * semantics. Aligned on 32-bit. Always contain a value in the
>> +	 * range of possible CPUs, although the value may not be the
>> +	 * actual current CPU (e.g. if rseq is not initialized). This
>> +	 * CPU number value should always be confirmed against the value
>> +	 * of the cpu_id field.
>> +	 */
>> +	uint32_t cpu_id_start;
>> +	/*
>> +	 * Restartable sequences cpu_id field. Updated by the kernel,
>> +	 * and read by user-space with single-copy atomicity semantics.
>> +	 * Aligned on 32-bit. Values -1U and -2U have a special
>> +	 * semantic: -1U means "rseq uninitialized", and -2U means "rseq
>> +	 * initialization failed".
>> +	 */
>> +	uint32_t cpu_id;
>> +	/*
>> +	 * Restartable sequences rseq_cs field.
>> +	 *
>> +	 * Contains NULL when no critical section is active for the current
>> +	 * thread, or holds a pointer to the currently active struct rseq_cs.
>> +	 *
>> +	 * Updated by user-space at the beginning of assembly instruction
>> +	 * sequence block, and by the kernel when it restarts an assembly
>> +	 * instruction sequence block, and when the kernel detects that it
>> +	 * is preempting or delivering a signal outside of the range
>> +	 * targeted by the rseq_cs. Also needs to be cleared by user-space
>> +	 * before reclaiming memory that contains the targeted struct
>> +	 * rseq_cs.
>> +	 *
>> +	 * Read and set by the kernel with single-copy atomicity semantics.
>> +	 * Aligned on 64-bit.
>> +	 */
>> +	RSEQ_FIELD_u32_u64(rseq_cs);
>> +	/*
>> +	 * - RSEQ_DISABLE flag:
>> +	 *
>> +	 * Fallback fast-track flag for single-stepping.
>> +	 * Set by user-space if lack of progress is detected.
>> +	 * Cleared by user-space after rseq finish.
>> +	 * Read by the kernel.
>> +	 * - RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT
>> +	 *     Inhibit instruction sequence block restart and event
>> +	 *     counter increment on preemption for this thread.
> 
> Nit: "event counter" has been removed entirely ;-)
> 
>> +	 * - RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL
>> +	 *     Inhibit instruction sequence block restart and event
>> +	 *     counter increment on signal delivery for this thread.
>> +	 * - RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE
>> +	 *     Inhibit instruction sequence block restart and event
>> +	 *     counter increment on migration for this thread.
>> +	 */
>> +	uint32_t flags;
>> +} __attribute__((aligned(4 * sizeof(uint64_t))));
>> +
>> +#endif /* _UAPI_LINUX_RSEQ_H */
> [...]
>> +	} else {
>> +		/*
>> +		 * If there was no rseq previously registered,
>> +		 * we need to ensure the provided rseq is
>> +		 * properly aligned and valid.
>> +		 */
>> +		if (!IS_ALIGNED((unsigned long)rseq, __alignof__(*rseq))
>> +				|| rseq_len != sizeof(*rseq))
>> +			return -EINVAL;
>> +		if (!access_ok(VERIFY_WRITE, rseq, rseq_len))
>> +			return -EFAULT;
>> +		current->rseq = rseq;
>> +		current->rseq_len = rseq_len;
>> +		current->rseq_sig = sig;
>> +		/*
>> +		 * If rseq was previously inactive, and has just
>> +		 * been registered, ensure the cpu_id and
>> +		 * event_counter fields are updated before
> 
> s/event_counter/cpu_start_id/ ?

I guess you mean "cpu_id_start".

Good point. v11 will include that fix. Meanwhile, it is available
at https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/log/?h=rseq/dev

Thanks,

Mathieu

> 
> Regards,
> Boqun
> 
>> +		 * returning to user-space.
>> +		 */
>> +		rseq_set_notify_resume(current);
>> +	}
>> +
>> +	return 0;
>> +}
> [...]

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v2 for 4.15 08/14] Provide cpu_opv system call
  2017-11-07  2:07     ` Boqun Feng
@ 2017-11-07  2:40       ` Mathieu Desnoyers
  -1 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2017-11-07  2:40 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Peter Zijlstra, Paul E. McKenney, Andy Lutomirski, Dave Watson,
	linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon,
	Michael Kerrisk

[-- Attachment #1: Type: text/plain, Size: 1052 bytes --]

----- On Nov 6, 2017, at 9:07 PM, Boqun Feng boqun.feng@gmail.com wrote:

> On Mon, Nov 06, 2017 at 03:56:38PM -0500, Mathieu Desnoyers wrote:
> [...]
>> +static int cpu_op_pin_pages(unsigned long addr, unsigned long len,
>> +		struct page ***pinned_pages_ptr, size_t *nr_pinned,
>> +		int write)
>> +{
>> +	struct page *pages[2];
>> +	int ret, nr_pages;
>> +
>> +	if (!len)
>> +		return 0;
>> +	nr_pages = cpu_op_range_nr_pages(addr, len);
>> +	BUG_ON(nr_pages > 2);
>> +	if (*nr_pinned + nr_pages > NR_PINNED_PAGES_ON_STACK) {
> 
> Is this a bug? Seems you will kzalloc() every time if *nr_pinned is
> bigger than NR_PINNED_PAGES_ON_STACK, which will result in memory
> leaking.
> 
> I think the logic here is complex enough for us to introduce a
> structure, like:
> 
>	struct cpu_opv_page_pinner {
>		int nr_pinned;
>		bool is_kmalloc;
>		struct page **pinned_pages;
>	};
> 
> Thoughts?

Good catch !

How about the attached diff ? I'll fold it into the rseq/dev tree.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: fix-cpu-opv-leak.patch --]
[-- Type: text/x-patch; name=fix-cpu-opv-leak.patch, Size: 5793 bytes --]

diff --git a/kernel/cpu_opv.c b/kernel/cpu_opv.c
index 09754bbe6a4f..3d8fd66416a0 100644
--- a/kernel/cpu_opv.c
+++ b/kernel/cpu_opv.c
@@ -46,6 +46,12 @@ union op_fn_data {
 #endif
 };
 
+struct cpu_opv_pinned_pages {
+	struct page **pages;
+	size_t nr;
+	bool is_kmalloc;
+};
+
 typedef int (*op_fn_t)(union op_fn_data *data, uint64_t v, uint32_t len);
 
 static DEFINE_MUTEX(cpu_opv_offline_lock);
@@ -217,8 +223,7 @@ static int cpu_op_check_pages(struct page **pages,
 }
 
 static int cpu_op_pin_pages(unsigned long addr, unsigned long len,
-		struct page ***pinned_pages_ptr, size_t *nr_pinned,
-		int write)
+		struct cpu_opv_pinned_pages *pin_pages, int write)
 {
 	struct page *pages[2];
 	int ret, nr_pages;
@@ -227,15 +232,17 @@ static int cpu_op_pin_pages(unsigned long addr, unsigned long len,
 		return 0;
 	nr_pages = cpu_op_range_nr_pages(addr, len);
 	BUG_ON(nr_pages > 2);
-	if (*nr_pinned + nr_pages > NR_PINNED_PAGES_ON_STACK) {
+	if (!pin_pages->is_kmalloc && pin_pages->nr + nr_pages
+			> NR_PINNED_PAGES_ON_STACK) {
 		struct page **pinned_pages =
 			kzalloc(CPU_OP_VEC_LEN_MAX * CPU_OP_MAX_PAGES
 				* sizeof(struct page *), GFP_KERNEL);
 		if (!pinned_pages)
 			return -ENOMEM;
-		memcpy(pinned_pages, *pinned_pages_ptr,
-			*nr_pinned * sizeof(struct page *));
-		*pinned_pages_ptr = pinned_pages;
+		memcpy(pinned_pages, pin_pages->pages,
+			pin_pages->nr * sizeof(struct page *));
+		pin_pages->pages = pinned_pages;
+		pin_pages->is_kmalloc = true;
 	}
 again:
 	ret = get_user_pages_fast(addr, nr_pages, write, pages);
@@ -257,9 +264,9 @@ static int cpu_op_pin_pages(unsigned long addr, unsigned long len,
 	}
 	if (ret)
 		goto error;
-	(*pinned_pages_ptr)[(*nr_pinned)++] = pages[0];
+	pin_pages->pages[pin_pages->nr++] = pages[0];
 	if (nr_pages > 1)
-		(*pinned_pages_ptr)[(*nr_pinned)++] = pages[1];
+		pin_pages->pages[pin_pages->nr++] = pages[1];
 	return 0;
 
 error:
@@ -270,7 +277,7 @@ static int cpu_op_pin_pages(unsigned long addr, unsigned long len,
 }
 
 static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
-		struct page ***pinned_pages_ptr, size_t *nr_pinned)
+		struct cpu_opv_pinned_pages *pin_pages)
 {
 	int ret, i;
 	bool expect_fault = false;
@@ -289,7 +296,7 @@ static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
 				goto error;
 			ret = cpu_op_pin_pages(
 					(unsigned long)op->u.compare_op.a,
-					op->len, pinned_pages_ptr, nr_pinned, 0);
+					op->len, pin_pages, 0);
 			if (ret)
 				goto error;
 			ret = -EFAULT;
@@ -299,7 +306,7 @@ static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
 				goto error;
 			ret = cpu_op_pin_pages(
 					(unsigned long)op->u.compare_op.b,
-					op->len, pinned_pages_ptr, nr_pinned, 0);
+					op->len, pin_pages, 0);
 			if (ret)
 				goto error;
 			break;
@@ -311,7 +318,7 @@ static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
 				goto error;
 			ret = cpu_op_pin_pages(
 					(unsigned long)op->u.memcpy_op.dst,
-					op->len, pinned_pages_ptr, nr_pinned, 1);
+					op->len, pin_pages, 1);
 			if (ret)
 				goto error;
 			ret = -EFAULT;
@@ -321,7 +328,7 @@ static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
 				goto error;
 			ret = cpu_op_pin_pages(
 					(unsigned long)op->u.memcpy_op.src,
-					op->len, pinned_pages_ptr, nr_pinned, 0);
+					op->len, pin_pages, 0);
 			if (ret)
 				goto error;
 			break;
@@ -333,7 +340,7 @@ static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
 				goto error;
 			ret = cpu_op_pin_pages(
 					(unsigned long)op->u.arithmetic_op.p,
-					op->len, pinned_pages_ptr, nr_pinned, 1);
+					op->len, pin_pages, 1);
 			if (ret)
 				goto error;
 			break;
@@ -347,7 +354,7 @@ static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
 				goto error;
 			ret = cpu_op_pin_pages(
 					(unsigned long)op->u.bitwise_op.p,
-					op->len, pinned_pages_ptr, nr_pinned, 1);
+					op->len, pin_pages, 1);
 			if (ret)
 				goto error;
 			break;
@@ -360,7 +367,7 @@ static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
 				goto error;
 			ret = cpu_op_pin_pages(
 					(unsigned long)op->u.shift_op.p,
-					op->len, pinned_pages_ptr, nr_pinned, 1);
+					op->len, pin_pages, 1);
 			if (ret)
 				goto error;
 			break;
@@ -373,9 +380,9 @@ static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
 	return 0;
 
 error:
-	for (i = 0; i < *nr_pinned; i++)
-		put_page((*pinned_pages_ptr)[i]);
-	*nr_pinned = 0;
+	for (i = 0; i < pin_pages->nr; i++)
+		put_page(pin_pages->pages[i]);
+	pin_pages->nr = 0;
 	/*
 	 * If faulting access is expected, return EAGAIN to user-space.
 	 * It allows user-space to distinguish between a fault caused by
@@ -923,9 +930,12 @@ SYSCALL_DEFINE4(cpu_opv, struct cpu_op __user *, ucpuopv, int, cpuopcnt,
 {
 	struct cpu_op cpuopv[CPU_OP_VEC_LEN_MAX];
 	struct page *pinned_pages_on_stack[NR_PINNED_PAGES_ON_STACK];
-	struct page **pinned_pages = pinned_pages_on_stack;
+	struct cpu_opv_pinned_pages pin_pages = {
+		.pages = pinned_pages_on_stack,
+		.nr = 0,
+		.is_kmalloc = false,
+	};
 	int ret, i;
-	size_t nr_pinned = 0;
 
 	if (unlikely(flags))
 		return -EINVAL;
@@ -938,15 +948,14 @@ SYSCALL_DEFINE4(cpu_opv, struct cpu_op __user *, ucpuopv, int, cpuopcnt,
 	ret = cpu_opv_check(cpuopv, cpuopcnt);
 	if (ret)
 		return ret;
-	ret = cpu_opv_pin_pages(cpuopv, cpuopcnt,
-				&pinned_pages, &nr_pinned);
+	ret = cpu_opv_pin_pages(cpuopv, cpuopcnt, &pin_pages);
 	if (ret)
 		goto end;
 	ret = do_cpu_opv(cpuopv, cpuopcnt, cpu);
-	for (i = 0; i < nr_pinned; i++)
-		put_page(pinned_pages[i]);
+	for (i = 0; i < pin_pages.nr; i++)
+		put_page(pin_pages.pages[i]);
 end:
-	if (pinned_pages != pinned_pages_on_stack)
-		kfree(pinned_pages);
+	if (pin_pages.is_kmalloc)
+		kfree(pin_pages.pages);
 	return ret;
 }

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v2 for 4.15 08/14] Provide cpu_opv system call
@ 2017-11-07  2:40       ` Mathieu Desnoyers
  0 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2017-11-07  2:40 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Peter Zijlstra, Paul E. McKenney, Andy Lutomirski, Dave Watson,
	linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas

[-- Attachment #1: Type: text/plain, Size: 1082 bytes --]

----- On Nov 6, 2017, at 9:07 PM, Boqun Feng boqun.feng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org wrote:

> On Mon, Nov 06, 2017 at 03:56:38PM -0500, Mathieu Desnoyers wrote:
> [...]
>> +static int cpu_op_pin_pages(unsigned long addr, unsigned long len,
>> +		struct page ***pinned_pages_ptr, size_t *nr_pinned,
>> +		int write)
>> +{
>> +	struct page *pages[2];
>> +	int ret, nr_pages;
>> +
>> +	if (!len)
>> +		return 0;
>> +	nr_pages = cpu_op_range_nr_pages(addr, len);
>> +	BUG_ON(nr_pages > 2);
>> +	if (*nr_pinned + nr_pages > NR_PINNED_PAGES_ON_STACK) {
> 
> Is this a bug? Seems you will kzalloc() every time if *nr_pinned is
> bigger than NR_PINNED_PAGES_ON_STACK, which will result in memory
> leaking.
> 
> I think the logic here is complex enough for us to introduce a
> structure, like:
> 
>	struct cpu_opv_page_pinner {
>		int nr_pinned;
>		bool is_kmalloc;
>		struct page **pinned_pages;
>	};
> 
> Thoughts?

Good catch !

How about the attached diff ? I'll fold it into the rseq/dev tree.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: fix-cpu-opv-leak.patch --]
[-- Type: text/x-patch; name=fix-cpu-opv-leak.patch, Size: 5793 bytes --]

diff --git a/kernel/cpu_opv.c b/kernel/cpu_opv.c
index 09754bbe6a4f..3d8fd66416a0 100644
--- a/kernel/cpu_opv.c
+++ b/kernel/cpu_opv.c
@@ -46,6 +46,12 @@ union op_fn_data {
 #endif
 };
 
+struct cpu_opv_pinned_pages {
+	struct page **pages;
+	size_t nr;
+	bool is_kmalloc;
+};
+
 typedef int (*op_fn_t)(union op_fn_data *data, uint64_t v, uint32_t len);
 
 static DEFINE_MUTEX(cpu_opv_offline_lock);
@@ -217,8 +223,7 @@ static int cpu_op_check_pages(struct page **pages,
 }
 
 static int cpu_op_pin_pages(unsigned long addr, unsigned long len,
-		struct page ***pinned_pages_ptr, size_t *nr_pinned,
-		int write)
+		struct cpu_opv_pinned_pages *pin_pages, int write)
 {
 	struct page *pages[2];
 	int ret, nr_pages;
@@ -227,15 +232,17 @@ static int cpu_op_pin_pages(unsigned long addr, unsigned long len,
 		return 0;
 	nr_pages = cpu_op_range_nr_pages(addr, len);
 	BUG_ON(nr_pages > 2);
-	if (*nr_pinned + nr_pages > NR_PINNED_PAGES_ON_STACK) {
+	if (!pin_pages->is_kmalloc && pin_pages->nr + nr_pages
+			> NR_PINNED_PAGES_ON_STACK) {
 		struct page **pinned_pages =
 			kzalloc(CPU_OP_VEC_LEN_MAX * CPU_OP_MAX_PAGES
 				* sizeof(struct page *), GFP_KERNEL);
 		if (!pinned_pages)
 			return -ENOMEM;
-		memcpy(pinned_pages, *pinned_pages_ptr,
-			*nr_pinned * sizeof(struct page *));
-		*pinned_pages_ptr = pinned_pages;
+		memcpy(pinned_pages, pin_pages->pages,
+			pin_pages->nr * sizeof(struct page *));
+		pin_pages->pages = pinned_pages;
+		pin_pages->is_kmalloc = true;
 	}
 again:
 	ret = get_user_pages_fast(addr, nr_pages, write, pages);
@@ -257,9 +264,9 @@ static int cpu_op_pin_pages(unsigned long addr, unsigned long len,
 	}
 	if (ret)
 		goto error;
-	(*pinned_pages_ptr)[(*nr_pinned)++] = pages[0];
+	pin_pages->pages[pin_pages->nr++] = pages[0];
 	if (nr_pages > 1)
-		(*pinned_pages_ptr)[(*nr_pinned)++] = pages[1];
+		pin_pages->pages[pin_pages->nr++] = pages[1];
 	return 0;
 
 error:
@@ -270,7 +277,7 @@ static int cpu_op_pin_pages(unsigned long addr, unsigned long len,
 }
 
 static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
-		struct page ***pinned_pages_ptr, size_t *nr_pinned)
+		struct cpu_opv_pinned_pages *pin_pages)
 {
 	int ret, i;
 	bool expect_fault = false;
@@ -289,7 +296,7 @@ static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
 				goto error;
 			ret = cpu_op_pin_pages(
 					(unsigned long)op->u.compare_op.a,
-					op->len, pinned_pages_ptr, nr_pinned, 0);
+					op->len, pin_pages, 0);
 			if (ret)
 				goto error;
 			ret = -EFAULT;
@@ -299,7 +306,7 @@ static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
 				goto error;
 			ret = cpu_op_pin_pages(
 					(unsigned long)op->u.compare_op.b,
-					op->len, pinned_pages_ptr, nr_pinned, 0);
+					op->len, pin_pages, 0);
 			if (ret)
 				goto error;
 			break;
@@ -311,7 +318,7 @@ static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
 				goto error;
 			ret = cpu_op_pin_pages(
 					(unsigned long)op->u.memcpy_op.dst,
-					op->len, pinned_pages_ptr, nr_pinned, 1);
+					op->len, pin_pages, 1);
 			if (ret)
 				goto error;
 			ret = -EFAULT;
@@ -321,7 +328,7 @@ static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
 				goto error;
 			ret = cpu_op_pin_pages(
 					(unsigned long)op->u.memcpy_op.src,
-					op->len, pinned_pages_ptr, nr_pinned, 0);
+					op->len, pin_pages, 0);
 			if (ret)
 				goto error;
 			break;
@@ -333,7 +340,7 @@ static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
 				goto error;
 			ret = cpu_op_pin_pages(
 					(unsigned long)op->u.arithmetic_op.p,
-					op->len, pinned_pages_ptr, nr_pinned, 1);
+					op->len, pin_pages, 1);
 			if (ret)
 				goto error;
 			break;
@@ -347,7 +354,7 @@ static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
 				goto error;
 			ret = cpu_op_pin_pages(
 					(unsigned long)op->u.bitwise_op.p,
-					op->len, pinned_pages_ptr, nr_pinned, 1);
+					op->len, pin_pages, 1);
 			if (ret)
 				goto error;
 			break;
@@ -360,7 +367,7 @@ static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
 				goto error;
 			ret = cpu_op_pin_pages(
 					(unsigned long)op->u.shift_op.p,
-					op->len, pinned_pages_ptr, nr_pinned, 1);
+					op->len, pin_pages, 1);
 			if (ret)
 				goto error;
 			break;
@@ -373,9 +380,9 @@ static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
 	return 0;
 
 error:
-	for (i = 0; i < *nr_pinned; i++)
-		put_page((*pinned_pages_ptr)[i]);
-	*nr_pinned = 0;
+	for (i = 0; i < pin_pages->nr; i++)
+		put_page(pin_pages->pages[i]);
+	pin_pages->nr = 0;
 	/*
 	 * If faulting access is expected, return EAGAIN to user-space.
 	 * It allows user-space to distinguish between a fault caused by
@@ -923,9 +930,12 @@ SYSCALL_DEFINE4(cpu_opv, struct cpu_op __user *, ucpuopv, int, cpuopcnt,
 {
 	struct cpu_op cpuopv[CPU_OP_VEC_LEN_MAX];
 	struct page *pinned_pages_on_stack[NR_PINNED_PAGES_ON_STACK];
-	struct page **pinned_pages = pinned_pages_on_stack;
+	struct cpu_opv_pinned_pages pin_pages = {
+		.pages = pinned_pages_on_stack,
+		.nr = 0,
+		.is_kmalloc = false,
+	};
 	int ret, i;
-	size_t nr_pinned = 0;
 
 	if (unlikely(flags))
 		return -EINVAL;
@@ -938,15 +948,14 @@ SYSCALL_DEFINE4(cpu_opv, struct cpu_op __user *, ucpuopv, int, cpuopcnt,
 	ret = cpu_opv_check(cpuopv, cpuopcnt);
 	if (ret)
 		return ret;
-	ret = cpu_opv_pin_pages(cpuopv, cpuopcnt,
-				&pinned_pages, &nr_pinned);
+	ret = cpu_opv_pin_pages(cpuopv, cpuopcnt, &pin_pages);
 	if (ret)
 		goto end;
 	ret = do_cpu_opv(cpuopv, cpuopcnt, cpu);
-	for (i = 0; i < nr_pinned; i++)
-		put_page(pinned_pages[i]);
+	for (i = 0; i < pin_pages.nr; i++)
+		put_page(pin_pages.pages[i]);
 end:
-	if (pinned_pages != pinned_pages_on_stack)
-		kfree(pinned_pages);
+	if (pin_pages.is_kmalloc)
+		kfree(pin_pages.pages);
 	return ret;
 }

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v2 for 4.15 08/14] Provide cpu_opv system call
@ 2017-11-07  3:03         ` Boqun Feng
  0 siblings, 0 replies; 51+ messages in thread
From: Boqun Feng @ 2017-11-07  3:03 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, Paul E. McKenney, Andy Lutomirski, Dave Watson,
	linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon,
	Michael Kerrisk

[-- Attachment #1: Type: text/plain, Size: 7640 bytes --]

On Tue, Nov 07, 2017 at 02:40:37AM +0000, Mathieu Desnoyers wrote:
> ----- On Nov 6, 2017, at 9:07 PM, Boqun Feng boqun.feng@gmail.com wrote:
> 
> > On Mon, Nov 06, 2017 at 03:56:38PM -0500, Mathieu Desnoyers wrote:
> > [...]
> >> +static int cpu_op_pin_pages(unsigned long addr, unsigned long len,
> >> +		struct page ***pinned_pages_ptr, size_t *nr_pinned,
> >> +		int write)
> >> +{
> >> +	struct page *pages[2];
> >> +	int ret, nr_pages;
> >> +
> >> +	if (!len)
> >> +		return 0;
> >> +	nr_pages = cpu_op_range_nr_pages(addr, len);
> >> +	BUG_ON(nr_pages > 2);
> >> +	if (*nr_pinned + nr_pages > NR_PINNED_PAGES_ON_STACK) {
> > 
> > Is this a bug? Seems you will kzalloc() every time if *nr_pinned is
> > bigger than NR_PINNED_PAGES_ON_STACK, which will result in memory
> > leaking.
> > 
> > I think the logic here is complex enough for us to introduce a
> > structure, like:
> > 
> >	struct cpu_opv_page_pinner {
> >		int nr_pinned;
> >		bool is_kmalloc;
> >		struct page **pinned_pages;
> >	};
> > 
> > Thoughts?
> 
> Good catch !
> 
> How about the attached diff ? I'll fold it into the rseq/dev tree.
> 

Looks good to me ;-)

Regards,
Boqun

> Thanks,
> 
> Mathieu
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

> diff --git a/kernel/cpu_opv.c b/kernel/cpu_opv.c
> index 09754bbe6a4f..3d8fd66416a0 100644
> --- a/kernel/cpu_opv.c
> +++ b/kernel/cpu_opv.c
> @@ -46,6 +46,12 @@ union op_fn_data {
>  #endif
>  };
>  
> +struct cpu_opv_pinned_pages {
> +	struct page **pages;
> +	size_t nr;
> +	bool is_kmalloc;
> +};
> +
>  typedef int (*op_fn_t)(union op_fn_data *data, uint64_t v, uint32_t len);
>  
>  static DEFINE_MUTEX(cpu_opv_offline_lock);
> @@ -217,8 +223,7 @@ static int cpu_op_check_pages(struct page **pages,
>  }
>  
>  static int cpu_op_pin_pages(unsigned long addr, unsigned long len,
> -		struct page ***pinned_pages_ptr, size_t *nr_pinned,
> -		int write)
> +		struct cpu_opv_pinned_pages *pin_pages, int write)
>  {
>  	struct page *pages[2];
>  	int ret, nr_pages;
> @@ -227,15 +232,17 @@ static int cpu_op_pin_pages(unsigned long addr, unsigned long len,
>  		return 0;
>  	nr_pages = cpu_op_range_nr_pages(addr, len);
>  	BUG_ON(nr_pages > 2);
> -	if (*nr_pinned + nr_pages > NR_PINNED_PAGES_ON_STACK) {
> +	if (!pin_pages->is_kmalloc && pin_pages->nr + nr_pages
> +			> NR_PINNED_PAGES_ON_STACK) {
>  		struct page **pinned_pages =
>  			kzalloc(CPU_OP_VEC_LEN_MAX * CPU_OP_MAX_PAGES
>  				* sizeof(struct page *), GFP_KERNEL);
>  		if (!pinned_pages)
>  			return -ENOMEM;
> -		memcpy(pinned_pages, *pinned_pages_ptr,
> -			*nr_pinned * sizeof(struct page *));
> -		*pinned_pages_ptr = pinned_pages;
> +		memcpy(pinned_pages, pin_pages->pages,
> +			pin_pages->nr * sizeof(struct page *));
> +		pin_pages->pages = pinned_pages;
> +		pin_pages->is_kmalloc = true;
>  	}
>  again:
>  	ret = get_user_pages_fast(addr, nr_pages, write, pages);
> @@ -257,9 +264,9 @@ static int cpu_op_pin_pages(unsigned long addr, unsigned long len,
>  	}
>  	if (ret)
>  		goto error;
> -	(*pinned_pages_ptr)[(*nr_pinned)++] = pages[0];
> +	pin_pages->pages[pin_pages->nr++] = pages[0];
>  	if (nr_pages > 1)
> -		(*pinned_pages_ptr)[(*nr_pinned)++] = pages[1];
> +		pin_pages->pages[pin_pages->nr++] = pages[1];
>  	return 0;
>  
>  error:
> @@ -270,7 +277,7 @@ static int cpu_op_pin_pages(unsigned long addr, unsigned long len,
>  }
>  
>  static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
> -		struct page ***pinned_pages_ptr, size_t *nr_pinned)
> +		struct cpu_opv_pinned_pages *pin_pages)
>  {
>  	int ret, i;
>  	bool expect_fault = false;
> @@ -289,7 +296,7 @@ static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
>  				goto error;
>  			ret = cpu_op_pin_pages(
>  					(unsigned long)op->u.compare_op.a,
> -					op->len, pinned_pages_ptr, nr_pinned, 0);
> +					op->len, pin_pages, 0);
>  			if (ret)
>  				goto error;
>  			ret = -EFAULT;
> @@ -299,7 +306,7 @@ static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
>  				goto error;
>  			ret = cpu_op_pin_pages(
>  					(unsigned long)op->u.compare_op.b,
> -					op->len, pinned_pages_ptr, nr_pinned, 0);
> +					op->len, pin_pages, 0);
>  			if (ret)
>  				goto error;
>  			break;
> @@ -311,7 +318,7 @@ static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
>  				goto error;
>  			ret = cpu_op_pin_pages(
>  					(unsigned long)op->u.memcpy_op.dst,
> -					op->len, pinned_pages_ptr, nr_pinned, 1);
> +					op->len, pin_pages, 1);
>  			if (ret)
>  				goto error;
>  			ret = -EFAULT;
> @@ -321,7 +328,7 @@ static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
>  				goto error;
>  			ret = cpu_op_pin_pages(
>  					(unsigned long)op->u.memcpy_op.src,
> -					op->len, pinned_pages_ptr, nr_pinned, 0);
> +					op->len, pin_pages, 0);
>  			if (ret)
>  				goto error;
>  			break;
> @@ -333,7 +340,7 @@ static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
>  				goto error;
>  			ret = cpu_op_pin_pages(
>  					(unsigned long)op->u.arithmetic_op.p,
> -					op->len, pinned_pages_ptr, nr_pinned, 1);
> +					op->len, pin_pages, 1);
>  			if (ret)
>  				goto error;
>  			break;
> @@ -347,7 +354,7 @@ static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
>  				goto error;
>  			ret = cpu_op_pin_pages(
>  					(unsigned long)op->u.bitwise_op.p,
> -					op->len, pinned_pages_ptr, nr_pinned, 1);
> +					op->len, pin_pages, 1);
>  			if (ret)
>  				goto error;
>  			break;
> @@ -360,7 +367,7 @@ static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
>  				goto error;
>  			ret = cpu_op_pin_pages(
>  					(unsigned long)op->u.shift_op.p,
> -					op->len, pinned_pages_ptr, nr_pinned, 1);
> +					op->len, pin_pages, 1);
>  			if (ret)
>  				goto error;
>  			break;
> @@ -373,9 +380,9 @@ static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
>  	return 0;
>  
>  error:
> -	for (i = 0; i < *nr_pinned; i++)
> -		put_page((*pinned_pages_ptr)[i]);
> -	*nr_pinned = 0;
> +	for (i = 0; i < pin_pages->nr; i++)
> +		put_page(pin_pages->pages[i]);
> +	pin_pages->nr = 0;
>  	/*
>  	 * If faulting access is expected, return EAGAIN to user-space.
>  	 * It allows user-space to distinguish between a fault caused by
> @@ -923,9 +930,12 @@ SYSCALL_DEFINE4(cpu_opv, struct cpu_op __user *, ucpuopv, int, cpuopcnt,
>  {
>  	struct cpu_op cpuopv[CPU_OP_VEC_LEN_MAX];
>  	struct page *pinned_pages_on_stack[NR_PINNED_PAGES_ON_STACK];
> -	struct page **pinned_pages = pinned_pages_on_stack;
> +	struct cpu_opv_pinned_pages pin_pages = {
> +		.pages = pinned_pages_on_stack,
> +		.nr = 0,
> +		.is_kmalloc = false,
> +	};
>  	int ret, i;
> -	size_t nr_pinned = 0;
>  
>  	if (unlikely(flags))
>  		return -EINVAL;
> @@ -938,15 +948,14 @@ SYSCALL_DEFINE4(cpu_opv, struct cpu_op __user *, ucpuopv, int, cpuopcnt,
>  	ret = cpu_opv_check(cpuopv, cpuopcnt);
>  	if (ret)
>  		return ret;
> -	ret = cpu_opv_pin_pages(cpuopv, cpuopcnt,
> -				&pinned_pages, &nr_pinned);
> +	ret = cpu_opv_pin_pages(cpuopv, cpuopcnt, &pin_pages);
>  	if (ret)
>  		goto end;
>  	ret = do_cpu_opv(cpuopv, cpuopcnt, cpu);
> -	for (i = 0; i < nr_pinned; i++)
> -		put_page(pinned_pages[i]);
> +	for (i = 0; i < pin_pages.nr; i++)
> +		put_page(pin_pages.pages[i]);
>  end:
> -	if (pinned_pages != pinned_pages_on_stack)
> -		kfree(pinned_pages);
> +	if (pin_pages.is_kmalloc)
> +		kfree(pin_pages.pages);
>  	return ret;
>  }


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH v2 for 4.15 08/14] Provide cpu_opv system call
@ 2017-11-07  3:03         ` Boqun Feng
  0 siblings, 0 replies; 51+ messages in thread
From: Boqun Feng @ 2017-11-07  3:03 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, Paul E. McKenney, Andy Lutomirski, Dave Watson,
	linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas

[-- Attachment #1: Type: text/plain, Size: 7670 bytes --]

On Tue, Nov 07, 2017 at 02:40:37AM +0000, Mathieu Desnoyers wrote:
> ----- On Nov 6, 2017, at 9:07 PM, Boqun Feng boqun.feng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org wrote:
> 
> > On Mon, Nov 06, 2017 at 03:56:38PM -0500, Mathieu Desnoyers wrote:
> > [...]
> >> +static int cpu_op_pin_pages(unsigned long addr, unsigned long len,
> >> +		struct page ***pinned_pages_ptr, size_t *nr_pinned,
> >> +		int write)
> >> +{
> >> +	struct page *pages[2];
> >> +	int ret, nr_pages;
> >> +
> >> +	if (!len)
> >> +		return 0;
> >> +	nr_pages = cpu_op_range_nr_pages(addr, len);
> >> +	BUG_ON(nr_pages > 2);
> >> +	if (*nr_pinned + nr_pages > NR_PINNED_PAGES_ON_STACK) {
> > 
> > Is this a bug? Seems you will kzalloc() every time if *nr_pinned is
> > bigger than NR_PINNED_PAGES_ON_STACK, which will result in memory
> > leaking.
> > 
> > I think the logic here is complex enough for us to introduce a
> > structure, like:
> > 
> >	struct cpu_opv_page_pinner {
> >		int nr_pinned;
> >		bool is_kmalloc;
> >		struct page **pinned_pages;
> >	};
> > 
> > Thoughts?
> 
> Good catch !
> 
> How about the attached diff ? I'll fold it into the rseq/dev tree.
> 

Looks good to me ;-)

Regards,
Boqun

> Thanks,
> 
> Mathieu
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

> diff --git a/kernel/cpu_opv.c b/kernel/cpu_opv.c
> index 09754bbe6a4f..3d8fd66416a0 100644
> --- a/kernel/cpu_opv.c
> +++ b/kernel/cpu_opv.c
> @@ -46,6 +46,12 @@ union op_fn_data {
>  #endif
>  };
>  
> +struct cpu_opv_pinned_pages {
> +	struct page **pages;
> +	size_t nr;
> +	bool is_kmalloc;
> +};
> +
>  typedef int (*op_fn_t)(union op_fn_data *data, uint64_t v, uint32_t len);
>  
>  static DEFINE_MUTEX(cpu_opv_offline_lock);
> @@ -217,8 +223,7 @@ static int cpu_op_check_pages(struct page **pages,
>  }
>  
>  static int cpu_op_pin_pages(unsigned long addr, unsigned long len,
> -		struct page ***pinned_pages_ptr, size_t *nr_pinned,
> -		int write)
> +		struct cpu_opv_pinned_pages *pin_pages, int write)
>  {
>  	struct page *pages[2];
>  	int ret, nr_pages;
> @@ -227,15 +232,17 @@ static int cpu_op_pin_pages(unsigned long addr, unsigned long len,
>  		return 0;
>  	nr_pages = cpu_op_range_nr_pages(addr, len);
>  	BUG_ON(nr_pages > 2);
> -	if (*nr_pinned + nr_pages > NR_PINNED_PAGES_ON_STACK) {
> +	if (!pin_pages->is_kmalloc && pin_pages->nr + nr_pages
> +			> NR_PINNED_PAGES_ON_STACK) {
>  		struct page **pinned_pages =
>  			kzalloc(CPU_OP_VEC_LEN_MAX * CPU_OP_MAX_PAGES
>  				* sizeof(struct page *), GFP_KERNEL);
>  		if (!pinned_pages)
>  			return -ENOMEM;
> -		memcpy(pinned_pages, *pinned_pages_ptr,
> -			*nr_pinned * sizeof(struct page *));
> -		*pinned_pages_ptr = pinned_pages;
> +		memcpy(pinned_pages, pin_pages->pages,
> +			pin_pages->nr * sizeof(struct page *));
> +		pin_pages->pages = pinned_pages;
> +		pin_pages->is_kmalloc = true;
>  	}
>  again:
>  	ret = get_user_pages_fast(addr, nr_pages, write, pages);
> @@ -257,9 +264,9 @@ static int cpu_op_pin_pages(unsigned long addr, unsigned long len,
>  	}
>  	if (ret)
>  		goto error;
> -	(*pinned_pages_ptr)[(*nr_pinned)++] = pages[0];
> +	pin_pages->pages[pin_pages->nr++] = pages[0];
>  	if (nr_pages > 1)
> -		(*pinned_pages_ptr)[(*nr_pinned)++] = pages[1];
> +		pin_pages->pages[pin_pages->nr++] = pages[1];
>  	return 0;
>  
>  error:
> @@ -270,7 +277,7 @@ static int cpu_op_pin_pages(unsigned long addr, unsigned long len,
>  }
>  
>  static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
> -		struct page ***pinned_pages_ptr, size_t *nr_pinned)
> +		struct cpu_opv_pinned_pages *pin_pages)
>  {
>  	int ret, i;
>  	bool expect_fault = false;
> @@ -289,7 +296,7 @@ static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
>  				goto error;
>  			ret = cpu_op_pin_pages(
>  					(unsigned long)op->u.compare_op.a,
> -					op->len, pinned_pages_ptr, nr_pinned, 0);
> +					op->len, pin_pages, 0);
>  			if (ret)
>  				goto error;
>  			ret = -EFAULT;
> @@ -299,7 +306,7 @@ static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
>  				goto error;
>  			ret = cpu_op_pin_pages(
>  					(unsigned long)op->u.compare_op.b,
> -					op->len, pinned_pages_ptr, nr_pinned, 0);
> +					op->len, pin_pages, 0);
>  			if (ret)
>  				goto error;
>  			break;
> @@ -311,7 +318,7 @@ static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
>  				goto error;
>  			ret = cpu_op_pin_pages(
>  					(unsigned long)op->u.memcpy_op.dst,
> -					op->len, pinned_pages_ptr, nr_pinned, 1);
> +					op->len, pin_pages, 1);
>  			if (ret)
>  				goto error;
>  			ret = -EFAULT;
> @@ -321,7 +328,7 @@ static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
>  				goto error;
>  			ret = cpu_op_pin_pages(
>  					(unsigned long)op->u.memcpy_op.src,
> -					op->len, pinned_pages_ptr, nr_pinned, 0);
> +					op->len, pin_pages, 0);
>  			if (ret)
>  				goto error;
>  			break;
> @@ -333,7 +340,7 @@ static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
>  				goto error;
>  			ret = cpu_op_pin_pages(
>  					(unsigned long)op->u.arithmetic_op.p,
> -					op->len, pinned_pages_ptr, nr_pinned, 1);
> +					op->len, pin_pages, 1);
>  			if (ret)
>  				goto error;
>  			break;
> @@ -347,7 +354,7 @@ static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
>  				goto error;
>  			ret = cpu_op_pin_pages(
>  					(unsigned long)op->u.bitwise_op.p,
> -					op->len, pinned_pages_ptr, nr_pinned, 1);
> +					op->len, pin_pages, 1);
>  			if (ret)
>  				goto error;
>  			break;
> @@ -360,7 +367,7 @@ static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
>  				goto error;
>  			ret = cpu_op_pin_pages(
>  					(unsigned long)op->u.shift_op.p,
> -					op->len, pinned_pages_ptr, nr_pinned, 1);
> +					op->len, pin_pages, 1);
>  			if (ret)
>  				goto error;
>  			break;
> @@ -373,9 +380,9 @@ static int cpu_opv_pin_pages(struct cpu_op *cpuop, int cpuopcnt,
>  	return 0;
>  
>  error:
> -	for (i = 0; i < *nr_pinned; i++)
> -		put_page((*pinned_pages_ptr)[i]);
> -	*nr_pinned = 0;
> +	for (i = 0; i < pin_pages->nr; i++)
> +		put_page(pin_pages->pages[i]);
> +	pin_pages->nr = 0;
>  	/*
>  	 * If faulting access is expected, return EAGAIN to user-space.
>  	 * It allows user-space to distinguish between a fault caused by
> @@ -923,9 +930,12 @@ SYSCALL_DEFINE4(cpu_opv, struct cpu_op __user *, ucpuopv, int, cpuopcnt,
>  {
>  	struct cpu_op cpuopv[CPU_OP_VEC_LEN_MAX];
>  	struct page *pinned_pages_on_stack[NR_PINNED_PAGES_ON_STACK];
> -	struct page **pinned_pages = pinned_pages_on_stack;
> +	struct cpu_opv_pinned_pages pin_pages = {
> +		.pages = pinned_pages_on_stack,
> +		.nr = 0,
> +		.is_kmalloc = false,
> +	};
>  	int ret, i;
> -	size_t nr_pinned = 0;
>  
>  	if (unlikely(flags))
>  		return -EINVAL;
> @@ -938,15 +948,14 @@ SYSCALL_DEFINE4(cpu_opv, struct cpu_op __user *, ucpuopv, int, cpuopcnt,
>  	ret = cpu_opv_check(cpuopv, cpuopcnt);
>  	if (ret)
>  		return ret;
> -	ret = cpu_opv_pin_pages(cpuopv, cpuopcnt,
> -				&pinned_pages, &nr_pinned);
> +	ret = cpu_opv_pin_pages(cpuopv, cpuopcnt, &pin_pages);
>  	if (ret)
>  		goto end;
>  	ret = do_cpu_opv(cpuopv, cpuopcnt, cpu);
> -	for (i = 0; i < nr_pinned; i++)
> -		put_page(pinned_pages[i]);
> +	for (i = 0; i < pin_pages.nr; i++)
> +		put_page(pin_pages.pages[i]);
>  end:
> -	if (pinned_pages != pinned_pages_on_stack)
> -		kfree(pinned_pages);
> +	if (pin_pages.is_kmalloc)
> +		kfree(pin_pages.pages);
>  	return ret;
>  }


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] mm, sparse: do not swamp log with huge vmemmap allocation failures
  2017-11-06  9:22 ` Michal Hocko
@ 2017-11-07  9:06   ` Michal Hocko
  -1 siblings, 0 replies; 51+ messages in thread
From: Michal Hocko @ 2017-11-07  9:06 UTC (permalink / raw)
  To: Andrew Morton, Johannes Weiner; +Cc: Vlastimil Babka, linux-mm, LKML

Dohh, forgot to git add the follow up fix on top of Johannes' original
diff so it didn't make it into the finall commit. Could you fold this
into the patch Andrew, please?

Sorry about that.
---
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 3f85084cb8bb..9a745e2a6f9a 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -62,7 +62,7 @@ void * __meminit vmemmap_alloc_block(unsigned long size, int node)
 			return page_address(page);
 
 		if (!warned) {
-			warn_alloc(gfp_mask, NULL, "vmemmap alloc failure: order:%u", order);
+			warn_alloc(gfp_mask & ~__GFP_NOWARN, NULL, "vmemmap alloc failure: order:%u", order);
 			warned = true;
 		}
 		return NULL;
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH] mm, sparse: do not swamp log with huge vmemmap allocation failures
@ 2017-11-07  9:06   ` Michal Hocko
  0 siblings, 0 replies; 51+ messages in thread
From: Michal Hocko @ 2017-11-07  9:06 UTC (permalink / raw)
  To: Andrew Morton, Johannes Weiner; +Cc: Vlastimil Babka, linux-mm, LKML

Dohh, forgot to git add the follow up fix on top of Johannes' original
diff so it didn't make it into the finall commit. Could you fold this
into the patch Andrew, please?

Sorry about that.
---
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 3f85084cb8bb..9a745e2a6f9a 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -62,7 +62,7 @@ void * __meminit vmemmap_alloc_block(unsigned long size, int node)
 			return page_address(page);
 
 		if (!warned) {
-			warn_alloc(gfp_mask, NULL, "vmemmap alloc failure: order:%u", order);
+			warn_alloc(gfp_mask & ~__GFP_NOWARN, NULL, "vmemmap alloc failure: order:%u", order);
 			warned = true;
 		}
 		return NULL;
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2017-11-07  9:06 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-06 20:56 [RFC PATCH for 4.15 00/14] Restartable sequences and CPU op vector v10 Mathieu Desnoyers
2017-11-06 20:56 ` Mathieu Desnoyers
2017-11-06 20:56 ` [RFC PATCH v10 for 4.15 01/14] Restartable sequences system call Mathieu Desnoyers
2017-11-07  1:24   ` Boqun Feng
2017-11-07  1:24     ` Boqun Feng
2017-11-07  2:20     ` Mathieu Desnoyers
2017-11-07  2:20       ` Mathieu Desnoyers
2017-11-06 20:56 ` [RFC PATCH for 4.15 02/14] Restartable sequences: ARM 32 architecture support Mathieu Desnoyers
2017-11-06 20:56 ` [RFC PATCH for 4.15 03/14] Restartable sequences: wire up ARM 32 system call Mathieu Desnoyers
2017-11-06 20:56 ` [RFC PATCH for 4.15 04/14] Restartable sequences: x86 32/64 architecture support Mathieu Desnoyers
2017-11-06 20:56 ` [RFC PATCH for 4.15 05/14] Restartable sequences: wire up x86 32/64 system call Mathieu Desnoyers
2017-11-06 20:56 ` [RFC PATCH for 4.15 06/14] Restartable sequences: powerpc architecture support Mathieu Desnoyers
2017-11-06 20:56   ` Mathieu Desnoyers
2017-11-06 20:56 ` [RFC PATCH for 4.15 07/14] Restartable sequences: Wire up powerpc system call Mathieu Desnoyers
2017-11-06 20:56   ` Mathieu Desnoyers
2017-11-06 20:56 ` [RFC PATCH v2 for 4.15 08/14] Provide cpu_opv " Mathieu Desnoyers
2017-11-07  2:07   ` Boqun Feng
2017-11-07  2:07     ` Boqun Feng
2017-11-07  2:40     ` Mathieu Desnoyers
2017-11-07  2:40       ` Mathieu Desnoyers
2017-11-07  3:03       ` Boqun Feng
2017-11-07  3:03         ` Boqun Feng
2017-11-06 20:56 ` [RFC PATCH for 4.15 09/14] cpu_opv: Wire up x86 32/64 " Mathieu Desnoyers
2017-11-06 20:56   ` Mathieu Desnoyers
2017-11-06 20:56 ` [RFC PATCH for 4.15 10/14] cpu_opv: Wire up powerpc " Mathieu Desnoyers
2017-11-06 20:56   ` Mathieu Desnoyers
2017-11-07  0:37   ` Nicholas Piggin
2017-11-07  0:37     ` Nicholas Piggin
2017-11-07  0:47     ` Mathieu Desnoyers
2017-11-07  0:47       ` Mathieu Desnoyers
2017-11-07  1:21       ` Nicholas Piggin
2017-11-07  1:21         ` Nicholas Piggin
2017-11-06 20:56 ` [RFC PATCH for 4.15 11/14] cpu_opv: Wire up ARM32 " Mathieu Desnoyers
2017-11-06 20:56 ` [RFC PATCH v2 for 4.15 12/14] cpu_opv: Implement selftests Mathieu Desnoyers
2017-11-06 20:56 ` [RFC PATCH v2 for 4.15 13/14] Restartable sequences: Provide self-tests Mathieu Desnoyers
2017-11-06 20:56   ` Mathieu Desnoyers
2017-11-06 20:56 ` [RFC PATCH for 4.15 14/14] Restartable sequences selftests: arm: workaround gcc asm size guess Mathieu Desnoyers
2017-11-06 20:56   ` Mathieu Desnoyers
  -- strict thread matches above, loose matches on Subject: below --
2017-11-06  9:22 [PATCH] mm, sparse: do not swamp log with huge vmemmap allocation failures Michal Hocko
2017-11-06  9:22 ` Michal Hocko
2017-11-06 17:35 ` Johannes Weiner
2017-11-06 17:35   ` Johannes Weiner
2017-11-06 17:57   ` Joe Perches
2017-11-06 18:14 ` Khalid Aziz
2017-11-06 18:14   ` Khalid Aziz
2017-11-06 18:18   ` Michal Hocko
2017-11-06 18:18     ` Michal Hocko
2017-11-06 20:17     ` Khalid Aziz
2017-11-06 20:17       ` Khalid Aziz
2017-11-07  9:06 ` Michal Hocko
2017-11-07  9:06   ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.