iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: John Garry <john.garry@huawei.com>
To: Will Deacon <will@kernel.org>, Robin Murphy <robin.murphy@arm.com>
Cc: trivial@kernel.org, maz@kernel.org, linuxarm@huawei.com,
	linux-kernel@vger.kernel.org, iommu@lists.linux-foundation.org,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH 0/4] iommu/arm-smmu-v3: Improve cmdq lock efficiency
Date: Thu, 16 Jul 2020 17:50:41 +0100	[thread overview]
Message-ID: <cfd36aff-94ae-2019-3331-d43fba01070b@huawei.com> (raw)
In-Reply-To: <20200716113234.GA7290@willie-the-truck>

>>
>> Perhaps a silly question (I'm too engrossed in PMU world ATM to get properly
>> back up to speed on this), but couldn't this be done without cmpxchg anyway?
>> Instinctively it feels like instead of maintaining a literal software copy
>> of the prod value, we could resolve the "claim my slot in the queue" part
>> with atomic_fetch_add on a free-running 32-bit "pseudo-prod" index, then
>> whoever updates the hardware deals with the truncation and wrap bit to
>> convert it to an actual register value.
> 
> Maybe, but it's easier said than done. The hard part is figuring how that
> you have space and there's no way I'm touching that logic without a way to
> test this.
> 
> I'm also not keen to complicate the code because of systems that don't scale
> well with contended CAS [1]. If you put down loads of cores, you need an
> interconnect/coherence protocol that can handle it.

JFYI, I added some debug to the driver to get the cmpxchg() attempt rate 
while running a testharness (below):

cpus	rate
2	1.1
4	1.1
8	1.3
16	3.6
32	8.1
64	12.6
96	14.7

Ideal rate is 1. So it's not crazy high for many CPUs, but still 
drifting away from 1.

John

> 
> Will
> 
> [1] https://lore.kernel.org/lkml/20190607072652.GA5522@hc/T/#m0d00f107c29223045933292a8b5b90d2ca9b7e5c
> .
> 

//copied from Barry, thanks :)

static int ways = 64;
module_param(ways, int, S_IRUGO);

static int seconds = 20;
module_param(seconds, int, S_IRUGO);

int mappings[NR_CPUS];
struct semaphore sem[NR_CPUS];


#define COMPLETIONS_SIZE 50

static noinline dma_addr_t test_mapsingle(struct device *dev, void *buf, 
int size)
{
     dma_addr_t dma_addr = dma_map_single(dev, buf, size, DMA_TO_DEVICE);
     return dma_addr;
}

static noinline void test_unmapsingle(struct device *dev, void *buf, int 
size, dma_addr_t dma_addr)
{
     dma_unmap_single(dev, dma_addr, size, DMA_TO_DEVICE);
}

static noinline void test_memcpy(void *out, void *in, int size)
{
     memcpy(out, in, size);
}

//just a hack to get a dev h behind a SMMU
extern struct device *hisi_dev;

static int testthread(void *data)
{
     unsigned long stop = jiffies +seconds*HZ;
     struct device *dev = hisi_dev;
     char *inputs[COMPLETIONS_SIZE];
     char *outputs[COMPLETIONS_SIZE];
     dma_addr_t dma_addr[COMPLETIONS_SIZE];
     int i, cpu = smp_processor_id();
     struct semaphore *sem = data;

     for (i = 0; i < COMPLETIONS_SIZE; i++) {
         inputs[i] = kzalloc(4096, GFP_KERNEL);
         if (!inputs[i])
             return -ENOMEM;
     }

     for (i = 0; i < COMPLETIONS_SIZE; i++) {
         outputs[i] = kzalloc(4096, GFP_KERNEL);
         if (!outputs[i])
             return -ENOMEM;
     }

     while (time_before(jiffies, stop)) {
         for (i = 0; i < COMPLETIONS_SIZE; i++) {
             dma_addr[i] = test_mapsingle(dev, inputs[i], 4096);
             test_memcpy(outputs[i], inputs[i], 4096);
         }
         for (i = 0; i < COMPLETIONS_SIZE; i++) {
             test_unmapsingle(dev, inputs[i], 4096, dma_addr[i]);
         }
         mappings[cpu] += COMPLETIONS_SIZE;
     }

     for (i = 0; i < COMPLETIONS_SIZE; i++) {
         kfree(outputs[i]);
         kfree(inputs[i]);
     }

     up(sem);

     return 0;
}

void smmu_test_core(int cpus)
{
     struct task_struct *tsk;
     int i;
     int total_mappings = 0;

     ways = cpus;

     for(i=0;i<ways;i++) {
         mappings[i] = 0;
         tsk = kthread_create_on_cpu(testthread, &sem[i], i,  "map_test");

         if (IS_ERR(tsk))
             printk(KERN_ERR "create test thread failed\n");
         wake_up_process(tsk);
     }

     for(i=0;i<ways;i++) {
         down(&sem[i]);
         total_mappings += mappings[i];
     }

     printk(KERN_ERR "finished total_mappings=%d (per way=%d) (rate=%d 
per second per cpu) ways=%d\n",
     total_mappings, total_mappings / ways, total_mappings / (seconds* 
ways), ways);

}
EXPORT_SYMBOL(smmu_test_core);


static int __init test_init(void)
{
     int i;

     for(i=0;i<NR_CPUS;i++)
         sema_init(&sem[i], 0);

     return 0;
}

static void __exit test_exit(void)
{
}

module_init(test_init);
module_exit(test_exit);
MODULE_LICENSE("GPL");

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

  reply	other threads:[~2020-07-16 16:52 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-22 17:28 [PATCH 0/4] iommu/arm-smmu-v3: Improve cmdq lock efficiency John Garry
2020-06-22 17:28 ` [PATCH 1/4] iommu/arm-smmu-v3: Fix trivial typo John Garry
2020-06-22 17:28 ` [PATCH 2/4] iommu/arm-smmu-v3: Calculate bits for prod and owner John Garry
2020-06-22 17:28 ` [PATCH 3/4] iommu/arm-smmu-v3: Always issue a CMD_SYNC per batch John Garry
2020-06-22 17:28 ` [PATCH 4/4] iommu/arm-smmu-v3: Remove cmpxchg() in arm_smmu_cmdq_issue_cmdlist() John Garry
2020-06-23  1:07   ` kernel test robot
2020-06-23  9:21     ` John Garry
2020-06-23  9:35       ` Rikard Falkeborn
2020-06-23 10:19         ` John Garry
2020-06-23 13:55           ` Rikard Falkeborn
2020-06-26 10:05             ` John Garry
2020-06-23 16:22       ` Robin Murphy
2020-06-24  8:15         ` John Garry
2020-07-16 10:20   ` Will Deacon
2020-07-16 10:26     ` John Garry
2020-07-08 13:00 ` [PATCH 0/4] iommu/arm-smmu-v3: Improve cmdq lock efficiency John Garry
2020-07-16 10:19 ` Will Deacon
2020-07-16 10:22   ` Will Deacon
2020-07-16 10:28     ` Will Deacon
2020-07-16 10:56       ` John Garry
2020-07-16 11:22         ` Robin Murphy
2020-07-16 11:30           ` John Garry
2020-07-16 11:32           ` Will Deacon
2020-07-16 16:50             ` John Garry [this message]
2020-07-16 13:31       ` John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cfd36aff-94ae-2019-3331-d43fba01070b@huawei.com \
    --to=john.garry@huawei.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=maz@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=trivial@kernel.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).