From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Michael R. Hines" Subject: Re: [PATCH] rdma: don't make pages writeable if not requiested Date: Thu, 21 Mar 2013 08:23:48 -0400 Message-ID: <514AFBD4.2050201@linux.vnet.ibm.com> References: <20130321061838.GA28319@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20130321061838.GA28319-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Sender: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: "Michael S. Tsirkin" Cc: Roland Dreier , Sean Hefty , Hal Rosenstock , Yishai Hadas , Christoph Lameter , linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, qemu-devel-qX2TKyscuCcdnm+yROfE0A@public.gmane.org List-Id: linux-rdma@vger.kernel.org Yes, I'd be happy to try the patch. Got meetings all day...... but will dive in soon. On 03/21/2013 02:18 AM, Michael S. Tsirkin wrote: > core/umem.c seems to get the arguments to get_user_pages > in the reverse order: it sets writeable flag and > breaks COW for MAP_SHARED if and only if hardware needs to > write the page. > > This breaks memory overcommit for users such as KVM: > each time we try to register a page to send it to remote, this > breaks COW. It seems that for applications that only have > REMOTE_READ permission, there is no reason to break COW at all. > > If the page that is COW has lots of copies, this makes the user process > quickly exceed the cgroups memory limit. This makes RDMA mostly useless > for virtualization, thus the stable tag. > > Reported-by: "Michael R. Hines" > Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > Signed-off-by: Michael S. Tsirkin > --- > > Note: compile-tested only, I don't have RDMA hardware at the moment. > Michael, could you please try this patch (also fixing your > usespace code not to request write access) and report? > > Note2: grep for get_user_pages in infiniband drivers turns up > lots of users who set write to 1 unconditionally. > These might be bugs too, should be checked. > > drivers/infiniband/core/umem.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c > index a841123..5929598 100644 > --- a/drivers/infiniband/core/umem.c > +++ b/drivers/infiniband/core/umem.c > @@ -152,7 +152,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, > ret = get_user_pages(current, current->mm, cur_base, > min_t(unsigned long, npages, > PAGE_SIZE / sizeof (struct page *)), > - 1, !umem->writable, page_list, vma_list); > + !umem->writable, 1, page_list, vma_list); > > if (ret < 0) > goto out; -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758076Ab3CUMX7 (ORCPT ); Thu, 21 Mar 2013 08:23:59 -0400 Received: from e7.ny.us.ibm.com ([32.97.182.137]:40183 "EHLO e7.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757902Ab3CUMX5 (ORCPT ); Thu, 21 Mar 2013 08:23:57 -0400 Message-ID: <514AFBD4.2050201@linux.vnet.ibm.com> Date: Thu, 21 Mar 2013 08:23:48 -0400 From: "Michael R. Hines" User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130106 Thunderbird/17.0.2 MIME-Version: 1.0 To: "Michael S. Tsirkin" CC: Roland Dreier , Sean Hefty , Hal Rosenstock , Yishai Hadas , Christoph Lameter , linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org, qemu-devel@nongnu.org Subject: Re: [PATCH] rdma: don't make pages writeable if not requiested References: <20130321061838.GA28319@redhat.com> In-Reply-To: <20130321061838.GA28319@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-TM-AS-MML: No X-Content-Scanned: Fidelis XPS MAILER x-cbid: 13032112-5806-0000-0000-0000206CF057 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Yes, I'd be happy to try the patch. Got meetings all day...... but will dive in soon. On 03/21/2013 02:18 AM, Michael S. Tsirkin wrote: > core/umem.c seems to get the arguments to get_user_pages > in the reverse order: it sets writeable flag and > breaks COW for MAP_SHARED if and only if hardware needs to > write the page. > > This breaks memory overcommit for users such as KVM: > each time we try to register a page to send it to remote, this > breaks COW. It seems that for applications that only have > REMOTE_READ permission, there is no reason to break COW at all. > > If the page that is COW has lots of copies, this makes the user process > quickly exceed the cgroups memory limit. This makes RDMA mostly useless > for virtualization, thus the stable tag. > > Reported-by: "Michael R. Hines" > Cc: stable@vger.kernel.org > Signed-off-by: Michael S. Tsirkin > --- > > Note: compile-tested only, I don't have RDMA hardware at the moment. > Michael, could you please try this patch (also fixing your > usespace code not to request write access) and report? > > Note2: grep for get_user_pages in infiniband drivers turns up > lots of users who set write to 1 unconditionally. > These might be bugs too, should be checked. > > drivers/infiniband/core/umem.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c > index a841123..5929598 100644 > --- a/drivers/infiniband/core/umem.c > +++ b/drivers/infiniband/core/umem.c > @@ -152,7 +152,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, > ret = get_user_pages(current, current->mm, cur_base, > min_t(unsigned long, npages, > PAGE_SIZE / sizeof (struct page *)), > - 1, !umem->writable, page_list, vma_list); > + !umem->writable, 1, page_list, vma_list); > > if (ret < 0) > goto out; From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:32884) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UIeXT-0007aS-Fh for qemu-devel@nongnu.org; Thu, 21 Mar 2013 08:24:15 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UIeXP-0004L6-Fc for qemu-devel@nongnu.org; Thu, 21 Mar 2013 08:24:11 -0400 Received: from e36.co.us.ibm.com ([32.97.110.154]:51219) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UIeXP-0004Ks-A1 for qemu-devel@nongnu.org; Thu, 21 Mar 2013 08:24:07 -0400 Received: from /spool/local by e36.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 21 Mar 2013 06:24:05 -0600 Received: from d03relay05.boulder.ibm.com (d03relay05.boulder.ibm.com [9.17.195.107]) by d03dlp02.boulder.ibm.com (Postfix) with ESMTP id 7B5443E40039 for ; Thu, 21 Mar 2013 06:23:49 -0600 (MDT) Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167]) by d03relay05.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r2LCNr05123132 for ; Thu, 21 Mar 2013 06:23:54 -0600 Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1]) by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id r2LCNoEN028026 for ; Thu, 21 Mar 2013 06:23:51 -0600 Message-ID: <514AFBD4.2050201@linux.vnet.ibm.com> Date: Thu, 21 Mar 2013 08:23:48 -0400 From: "Michael R. Hines" MIME-Version: 1.0 References: <20130321061838.GA28319@redhat.com> In-Reply-To: <20130321061838.GA28319@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH] rdma: don't make pages writeable if not requiested List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: Roland Dreier , qemu-devel@nongnu.org, linux-rdma@vger.kernel.org, Yishai Hadas , linux-kernel@vger.kernel.org, Hal Rosenstock , Sean Hefty , Christoph Lameter Yes, I'd be happy to try the patch. Got meetings all day...... but will dive in soon. On 03/21/2013 02:18 AM, Michael S. Tsirkin wrote: > core/umem.c seems to get the arguments to get_user_pages > in the reverse order: it sets writeable flag and > breaks COW for MAP_SHARED if and only if hardware needs to > write the page. > > This breaks memory overcommit for users such as KVM: > each time we try to register a page to send it to remote, this > breaks COW. It seems that for applications that only have > REMOTE_READ permission, there is no reason to break COW at all. > > If the page that is COW has lots of copies, this makes the user process > quickly exceed the cgroups memory limit. This makes RDMA mostly useless > for virtualization, thus the stable tag. > > Reported-by: "Michael R. Hines" > Cc: stable@vger.kernel.org > Signed-off-by: Michael S. Tsirkin > --- > > Note: compile-tested only, I don't have RDMA hardware at the moment. > Michael, could you please try this patch (also fixing your > usespace code not to request write access) and report? > > Note2: grep for get_user_pages in infiniband drivers turns up > lots of users who set write to 1 unconditionally. > These might be bugs too, should be checked. > > drivers/infiniband/core/umem.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c > index a841123..5929598 100644 > --- a/drivers/infiniband/core/umem.c > +++ b/drivers/infiniband/core/umem.c > @@ -152,7 +152,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, > ret = get_user_pages(current, current->mm, cur_base, > min_t(unsigned long, npages, > PAGE_SIZE / sizeof (struct page *)), > - 1, !umem->writable, page_list, vma_list); > + !umem->writable, 1, page_list, vma_list); > > if (ret < 0) > goto out;