From: Luis Henriques <lhenriques@suse.com> To: "Jeff Layton" <jlayton@kernel.org> Cc: "IlyaDryomov" <idryomov@gmail.com>, "Sage Weil" <sage@redhat.com>, <ceph-devel@vger.kernel.org>, <linux-kernel@vger.kernel.org> Subject: Re: [PATCH v2] ceph: allow object copies across different filesystems in the same cluster Date: Mon, 09 Sep 2019 12:15:16 +0100 [thread overview] Message-ID: <87d0g9oh4r.fsf@suse.com> (raw) In-Reply-To: <3f838e42a50575595c7310386cf698aca8f89607.camel@kernel.org> (Jeff Layton's message of "Mon, 09 Sep 2019 06:35:07 -0400") "Jeff Layton" <jlayton@kernel.org> writes: > On Mon, 2019-09-09 at 11:28 +0100, Luis Henriques wrote: >> OSDs are able to perform object copies across different pools. Thus, >> there's no need to prevent copy_file_range from doing remote copies if the >> source and destination superblocks are different. Only return -EXDEV if >> they have different fsid (the cluster ID). >> >> Signed-off-by: Luis Henriques <lhenriques@suse.com> >> --- >> fs/ceph/file.c | 18 ++++++++++++++---- >> 1 file changed, 14 insertions(+), 4 deletions(-) >> >> Hi, >> >> Here's the patch changelog since initial submittion: >> >> - Dropped have_fsid checks on client structs >> - Use %pU to print the fsid instead of raw hex strings (%*ph) >> - Fixed 'To:' field in email so that this time the patch hits vger >> >> Cheers, >> -- >> Luis >> >> diff --git a/fs/ceph/file.c b/fs/ceph/file.c >> index 685a03cc4b77..4a624a1dd0bb 100644 >> --- a/fs/ceph/file.c >> +++ b/fs/ceph/file.c >> @@ -1904,6 +1904,7 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off, >> struct ceph_inode_info *src_ci = ceph_inode(src_inode); >> struct ceph_inode_info *dst_ci = ceph_inode(dst_inode); >> struct ceph_cap_flush *prealloc_cf; >> + struct ceph_fs_client *src_fsc = ceph_inode_to_client(src_inode); >> struct ceph_object_locator src_oloc, dst_oloc; >> struct ceph_object_id src_oid, dst_oid; >> loff_t endoff = 0, size; >> @@ -1915,8 +1916,17 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off, >> >> if (src_inode == dst_inode) >> return -EINVAL; >> - if (src_inode->i_sb != dst_inode->i_sb) >> - return -EXDEV; >> + if (src_inode->i_sb != dst_inode->i_sb) { >> + struct ceph_fs_client *dst_fsc = ceph_inode_to_client(dst_inode); >> + >> + if (ceph_fsid_compare(&src_fsc->client->fsid, >> + &dst_fsc->client->fsid)) { >> + dout("Copying object across different clusters:"); >> + dout(" src fsid: %pU dst fsid: %pU\n", >> + &src_fsc->client->fsid, &dst_fsc->client->fsid); >> + return -EXDEV; >> + } >> + } > > Just to be clear: what happens here if I mount two entirely separate > clusters, and their OSDs don't have any access to one another? Will this > fail at some later point with an error that we can catch so that we can > fall back? This is exactly what this check prevents: if we have two CephFS from two unrelated clusters mounted and we try to copy a file across them, the operation will fail with -EXDEV[1] because the FSIDs for these two ceph_fs_client will be different. OTOH, if these two filesystems are within the same cluster (and thus with the same FSID), then the OSDs are able to do 'copy-from' operations between them. I've tested all these scenarios and they seem to be handled correctly. Now, I'm assuming that *all* OSDs within the same ceph cluster can communicate between themselves; if this assumption is false, then this patch is broken. But again, I'm not aware of any mechanism that prevents 2 OSDs from communicating between them. [1] Actually, the files will still be copied because we'll fallback into the default VFS generic_copy_file_range behaviour, which is to do reads+writes operations. Cheers, -- Luis > > >> if (ceph_snap(dst_inode) != CEPH_NOSNAP) >> return -EROFS; >> >> @@ -1928,7 +1938,7 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off, >> * efficient). >> */ >> >> - if (ceph_test_mount_opt(ceph_inode_to_client(src_inode), NOCOPYFROM)) >> + if (ceph_test_mount_opt(src_fsc, NOCOPYFROM)) >> return -EOPNOTSUPP; >> >> if ((src_ci->i_layout.stripe_unit != dst_ci->i_layout.stripe_unit) || >> @@ -2044,7 +2054,7 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off, >> dst_ci->i_vino.ino, dst_objnum); >> /* Do an object remote copy */ >> err = ceph_osdc_copy_from( >> - &ceph_inode_to_client(src_inode)->client->osdc, >> + &src_fsc->client->osdc, >> src_ci->i_vino.snap, 0, >> &src_oid, &src_oloc, >> CEPH_OSD_OP_FLAG_FADVISE_SEQUENTIAL |
WARNING: multiple messages have this Message-ID (diff)
From: Luis Henriques <lhenriques@suse.com> To: Jeff Layton <jlayton@kernel.org> Cc: IlyaDryomov <idryomov@gmail.com>, Sage Weil <sage@redhat.com>, ceph-devel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2] ceph: allow object copies across different filesystems in the same cluster Date: Mon, 09 Sep 2019 12:15:16 +0100 [thread overview] Message-ID: <87d0g9oh4r.fsf@suse.com> (raw) In-Reply-To: <3f838e42a50575595c7310386cf698aca8f89607.camel@kernel.org> (Jeff Layton's message of "Mon, 09 Sep 2019 06:35:07 -0400") "Jeff Layton" <jlayton@kernel.org> writes: > On Mon, 2019-09-09 at 11:28 +0100, Luis Henriques wrote: >> OSDs are able to perform object copies across different pools. Thus, >> there's no need to prevent copy_file_range from doing remote copies if the >> source and destination superblocks are different. Only return -EXDEV if >> they have different fsid (the cluster ID). >> >> Signed-off-by: Luis Henriques <lhenriques@suse.com> >> --- >> fs/ceph/file.c | 18 ++++++++++++++---- >> 1 file changed, 14 insertions(+), 4 deletions(-) >> >> Hi, >> >> Here's the patch changelog since initial submittion: >> >> - Dropped have_fsid checks on client structs >> - Use %pU to print the fsid instead of raw hex strings (%*ph) >> - Fixed 'To:' field in email so that this time the patch hits vger >> >> Cheers, >> -- >> Luis >> >> diff --git a/fs/ceph/file.c b/fs/ceph/file.c >> index 685a03cc4b77..4a624a1dd0bb 100644 >> --- a/fs/ceph/file.c >> +++ b/fs/ceph/file.c >> @@ -1904,6 +1904,7 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off, >> struct ceph_inode_info *src_ci = ceph_inode(src_inode); >> struct ceph_inode_info *dst_ci = ceph_inode(dst_inode); >> struct ceph_cap_flush *prealloc_cf; >> + struct ceph_fs_client *src_fsc = ceph_inode_to_client(src_inode); >> struct ceph_object_locator src_oloc, dst_oloc; >> struct ceph_object_id src_oid, dst_oid; >> loff_t endoff = 0, size; >> @@ -1915,8 +1916,17 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off, >> >> if (src_inode == dst_inode) >> return -EINVAL; >> - if (src_inode->i_sb != dst_inode->i_sb) >> - return -EXDEV; >> + if (src_inode->i_sb != dst_inode->i_sb) { >> + struct ceph_fs_client *dst_fsc = ceph_inode_to_client(dst_inode); >> + >> + if (ceph_fsid_compare(&src_fsc->client->fsid, >> + &dst_fsc->client->fsid)) { >> + dout("Copying object across different clusters:"); >> + dout(" src fsid: %pU dst fsid: %pU\n", >> + &src_fsc->client->fsid, &dst_fsc->client->fsid); >> + return -EXDEV; >> + } >> + } > > Just to be clear: what happens here if I mount two entirely separate > clusters, and their OSDs don't have any access to one another? Will this > fail at some later point with an error that we can catch so that we can > fall back? This is exactly what this check prevents: if we have two CephFS from two unrelated clusters mounted and we try to copy a file across them, the operation will fail with -EXDEV[1] because the FSIDs for these two ceph_fs_client will be different. OTOH, if these two filesystems are within the same cluster (and thus with the same FSID), then the OSDs are able to do 'copy-from' operations between them. I've tested all these scenarios and they seem to be handled correctly. Now, I'm assuming that *all* OSDs within the same ceph cluster can communicate between themselves; if this assumption is false, then this patch is broken. But again, I'm not aware of any mechanism that prevents 2 OSDs from communicating between them. [1] Actually, the files will still be copied because we'll fallback into the default VFS generic_copy_file_range behaviour, which is to do reads+writes operations. Cheers, -- Luis > > >> if (ceph_snap(dst_inode) != CEPH_NOSNAP) >> return -EROFS; >> >> @@ -1928,7 +1938,7 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off, >> * efficient). >> */ >> >> - if (ceph_test_mount_opt(ceph_inode_to_client(src_inode), NOCOPYFROM)) >> + if (ceph_test_mount_opt(src_fsc, NOCOPYFROM)) >> return -EOPNOTSUPP; >> >> if ((src_ci->i_layout.stripe_unit != dst_ci->i_layout.stripe_unit) || >> @@ -2044,7 +2054,7 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off, >> dst_ci->i_vino.ino, dst_objnum); >> /* Do an object remote copy */ >> err = ceph_osdc_copy_from( >> - &ceph_inode_to_client(src_inode)->client->osdc, >> + &src_fsc->client->osdc, >> src_ci->i_vino.snap, 0, >> &src_oid, &src_oloc, >> CEPH_OSD_OP_FLAG_FADVISE_SEQUENTIAL |
next prev parent reply other threads:[~2019-09-09 11:15 UTC|newest] Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top [not found] <20190906135750.29543-1-lhenriques@suse.com> [not found] ` <30b09cb015563913d073c488c8de8ba0cceedd7b.camel@kernel.org> 2019-09-06 16:26 ` [PATCH] ceph: allow object copies across different filesystems in the same cluster Luis Henriques 2019-09-06 16:26 ` Luis Henriques 2019-09-07 13:53 ` Jeff Layton 2019-09-09 10:18 ` Luis Henriques 2019-09-09 10:18 ` Luis Henriques 2019-09-09 10:28 ` [PATCH v2] " Luis Henriques 2019-09-09 10:35 ` Jeff Layton 2019-09-09 11:05 ` Jeff Layton 2019-09-09 13:55 ` Luis Henriques 2019-09-09 13:55 ` Luis Henriques 2019-09-09 15:21 ` Jeff Layton 2019-09-09 11:15 ` Luis Henriques [this message] 2019-09-09 11:15 ` Luis Henriques 2019-09-09 22:22 ` Gregory Farnum 2019-09-10 10:45 ` Luis Henriques 2019-09-09 10:51 ` Ilya Dryomov
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=87d0g9oh4r.fsf@suse.com \ --to=lhenriques@suse.com \ --cc=ceph-devel@vger.kernel.org \ --cc=idryomov@gmail.com \ --cc=jlayton@kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=sage@redhat.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.