All of lore.kernel.org
 help / color / mirror / Atom feed
* hardlink util -- files de-duplication
@ 2018-06-01 11:38 Karel Zak
  2018-06-01 13:08 ` Ruediger Meier
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Karel Zak @ 2018-06-01 11:38 UTC (permalink / raw)
  To: util-linux


For last 17 years in Red Hat based distros is available hardlink(1)
util, man hardlink:

   hardlink traverses one or more directories searching for duplicate
   files.  When it finds duplicate files, it uses one of them as the
   master.  It then removes all other duplicates and places a hardlink
   for each one pointing to the master file.  This allows for
   conservation of disk space where multiple directories on a single
   filesystem contain many  dupli‐ cate files.

   ...

the util is little bit orphaned, what about to add this util to
util-linux to make it available for another distros and keep it
maintained in serious way? ;-)

   https://src.fedoraproject.org/cgit/rpms/hardlink.git/

It's one .c file.

Comments & objections?

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: hardlink util -- files de-duplication
  2018-06-01 11:38 hardlink util -- files de-duplication Karel Zak
@ 2018-06-01 13:08 ` Ruediger Meier
  2018-06-01 20:20   ` Kevin Fenzi
  2018-06-01 13:25 ` Aurélien Aptel
  2018-06-01 13:29 ` Dmitry V. Levin
  2 siblings, 1 reply; 11+ messages in thread
From: Ruediger Meier @ 2018-06-01 13:08 UTC (permalink / raw)
  To: Karel Zak; +Cc: util-linux, Kevin Fenzi

On Friday 01 June 2018, Karel Zak wrote:
> For last 17 years in Red Hat based distros is available hardlink(1)
> util, man hardlink:
>
>    hardlink traverses one or more directories searching for duplicate
>    files.  When it finds duplicate files, it uses one of them as the
>    master.  It then removes all other duplicates and places a
> hardlink for each one pointing to the master file.  This allows for
> conservation of disk space where multiple directories on a single
> filesystem contain many  dupli cate files.
>
>    ...
>
> the util is little bit orphaned, what about to add this util to
> util-linux to make it available for another distros and keep it
> maintained in serious way? ;-)

+1

>
>    https://src.fedoraproject.org/cgit/rpms/hardlink.git/

The original and almost identical repo is this:
  https://pagure.io/hardlink.git

I've CC'ed the project admin Kevin Fenzi.

> It's one .c file.
>
> Comments & objections?
>
>     Karel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: hardlink util -- files de-duplication
  2018-06-01 11:38 hardlink util -- files de-duplication Karel Zak
  2018-06-01 13:08 ` Ruediger Meier
@ 2018-06-01 13:25 ` Aurélien Aptel
  2018-06-01 13:45   ` Samuel Thibault
  2018-06-06 12:02   ` Carlos Santos
  2018-06-01 13:29 ` Dmitry V. Levin
  2 siblings, 2 replies; 11+ messages in thread
From: Aurélien Aptel @ 2018-06-01 13:25 UTC (permalink / raw)
  To: Karel Zak, util-linux

Karel Zak <kzak@redhat.com> writes:
> Comments & objections?

Not objecting but I feel like I should mention there are multiple
well-established alternatives:

https://github.com/tobiasschulz/fdupes
http://freedup.org/
https://rdfind.pauldreik.se/
https://github.com/markfasheh/duperemove

-- 
Aurélien Aptel / SUSE Labs Samba Team
GPG: 1839 CB5F 9F5B FB9B AA97  8C99 03C8 A49B 521B D5D3
SUSE Linux GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: hardlink util -- files de-duplication
  2018-06-01 11:38 hardlink util -- files de-duplication Karel Zak
  2018-06-01 13:08 ` Ruediger Meier
  2018-06-01 13:25 ` Aurélien Aptel
@ 2018-06-01 13:29 ` Dmitry V. Levin
  2018-06-12 10:55   ` Karel Zak
  2 siblings, 1 reply; 11+ messages in thread
From: Dmitry V. Levin @ 2018-06-01 13:29 UTC (permalink / raw)
  To: Karel Zak; +Cc: Alexey Gladkov, util-linux

[-- Attachment #1: Type: text/plain, Size: 1019 bytes --]

On Fri, Jun 01, 2018 at 01:38:07PM +0200, Karel Zak wrote:
> For last 17 years in Red Hat based distros is available hardlink(1)
> util, man hardlink:
> 
>    hardlink traverses one or more directories searching for duplicate
>    files.  When it finds duplicate files, it uses one of them as the
>    master.  It then removes all other duplicates and places a hardlink
>    for each one pointing to the master file.  This allows for
>    conservation of disk space where multiple directories on a single
>    filesystem contain many  dupli‐ cate files.
> 
>    ...
> 
> the util is little bit orphaned, what about to add this util to
> util-linux to make it available for another distros and keep it
> maintained in serious way? ;-)
> 
>    https://src.fedoraproject.org/cgit/rpms/hardlink.git/
> 
> It's one .c file.
> 
> Comments & objections?

Better late than never.

BTW, our hardlink package has some Owl patches applied,
please remind us to rebase and submit them. ;)


-- 
ldv

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: hardlink util -- files de-duplication
  2018-06-01 13:25 ` Aurélien Aptel
@ 2018-06-01 13:45   ` Samuel Thibault
  2018-06-06 12:02   ` Carlos Santos
  1 sibling, 0 replies; 11+ messages in thread
From: Samuel Thibault @ 2018-06-01 13:45 UTC (permalink / raw)
  To: Aurélien Aptel; +Cc: Karel Zak, util-linux

Aurélien Aptel, le ven. 01 juin 2018 15:25:48 +0200, a ecrit:
> Karel Zak <kzak@redhat.com> writes:
> > Comments & objections?
> 
> Not objecting but I feel like I should mention there are multiple
> well-established alternatives:
> 
> https://github.com/tobiasschulz/fdupes
> http://freedup.org/
> https://rdfind.pauldreik.se/
> https://github.com/markfasheh/duperemove

Yes, in Debian it was mentioned that we'd need a de-duplication tool for
de-duplication tools :)

Samuel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: hardlink util -- files de-duplication
  2018-06-01 13:08 ` Ruediger Meier
@ 2018-06-01 20:20   ` Kevin Fenzi
  2018-06-02  0:00     ` Francisco J. Tsao Santin
  0 siblings, 1 reply; 11+ messages in thread
From: Kevin Fenzi @ 2018-06-01 20:20 UTC (permalink / raw)
  To: Ruediger Meier, Karel Zak; +Cc: util-linux, tsao


[-- Attachment #1.1: Type: text/plain, Size: 1345 bytes --]

On 06/01/2018 06:08 AM, Ruediger Meier wrote:
> On Friday 01 June 2018, Karel Zak wrote:
>> For last 17 years in Red Hat based distros is available hardlink(1)
>> util, man hardlink:
>>
>>    hardlink traverses one or more directories searching for duplicate
>>    files.  When it finds duplicate files, it uses one of them as the
>>    master.  It then removes all other duplicates and places a
>> hardlink for each one pointing to the master file.  This allows for
>> conservation of disk space where multiple directories on a single
>> filesystem contain many  dupli cate files.
>>
>>    ...
>>
>> the util is little bit orphaned, what about to add this util to
>> util-linux to make it available for another distros and keep it
>> maintained in serious way? ;-)
> 
> +1
> 
>>
>>    https://src.fedoraproject.org/cgit/rpms/hardlink.git/
> 
> The original and almost identical repo is this:
>   https://pagure.io/hardlink.git
> 
> I've CC'ed the project admin Kevin Fenzi.


I've also added my co-maintainer (tsao@fedoraproject.org).
He's done most of the recent work on it, I haven't had time to do much
with it at all.

I'd personally be in favor of it moving into util-linux. Hopefully it
would get more time and attention there and more widespread use.

If tsao agrees, lets make it happen.

kevin


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: hardlink util -- files de-duplication
  2018-06-01 20:20   ` Kevin Fenzi
@ 2018-06-02  0:00     ` Francisco J. Tsao Santin
  0 siblings, 0 replies; 11+ messages in thread
From: Francisco J. Tsao Santin @ 2018-06-02  0:00 UTC (permalink / raw)
  To: Kevin Fenzi; +Cc: Ruediger Meier, Karel Zak, util-linux, tsao

[-- Attachment #1: Type: text/plain, Size: 1310 bytes --]

On Fri, 1 Jun 2018, Kevin Fenzi wrote:

> On 06/01/2018 06:08 AM, Ruediger Meier wrote:
> > On Friday 01 June 2018, Karel Zak wrote:
> >> the util is little bit orphaned, what about to add this util to
> >> util-linux to make it available for another distros and keep it
> >> maintained in serious way? ;-)
> > 
> 
> I've also added my co-maintainer (tsao@fedoraproject.org).
> He's done most of the recent work on it, I haven't had time to do much
> with it at all.

Yep, I would like having more time to improve a bit the tool but... I
only added some patches for bugfixing.

> 
> I'd personally be in favor of it moving into util-linux. Hopefully it
> would get more time and attention there and more widespread use.
> 
> If tsao agrees, lets make it happen.

Good for me too :-) I only want to point a little issue: I suppose you know,
there is another hardlink tool with similar functions, and (re)written in
python by the Debian people[1][2]. In fact, the name of the file is the
same (but they are placed in different paths, /usr/bin vs /usr/sbin).
I hope it doesn't cause a distro-war ;-)
 
[1] https://packages.debian.org/sid/utils/hardlink
[2] https://jak-linux.org/projects/hardlink/
-- 
Francisco Javier Tsao Santín
http://gattaca.es
1024D/71CF4D62  42 F1 53 35 EF 98 98 8A FC 6C 56 B3 4C A7 7D FB

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: hardlink util -- files de-duplication
  2018-06-01 13:25 ` Aurélien Aptel
  2018-06-01 13:45   ` Samuel Thibault
@ 2018-06-06 12:02   ` Carlos Santos
  1 sibling, 0 replies; 11+ messages in thread
From: Carlos Santos @ 2018-06-06 12:02 UTC (permalink / raw)
  To: Aurélien Aptel; +Cc: Karel Zak, util-linux

> From: "Aur=C3=A9lien Aptel" <aaptel@suse.com>
> To: "Karel Zak" <kzak@redhat.com>, "util-linux" <util-linux@vger.kernel.o=
rg>
> Sent: Friday, June 1, 2018 10:25:48 AM
> Subject: Re: hardlink util -- files de-duplication

> Karel Zak <kzak@redhat.com> writes:
>> Comments & objections?
>=20
> Not objecting but I feel like I should mention there are multiple
> well-established alternatives:
>=20
> https://github.com/tobiasschulz/fdupes
> http://freedup.org/
> https://rdfind.pauldreik.se/
> https://github.com/markfasheh/duperemove

Compared to hardlink, fdupes has a richer feature set (e.g. dedupsoft
links). FreeDup has its own life and does not seem to be adoptable.
Dupremove is a complex software which requires glib2 and sqlite3.

--=20
Carlos Santos (Casantos) - DATACOM, P&D
=E2=80=9CMarched towards the enemy, spear upright, armed with the certainty
that only the ignorant can have.=E2=80=9D =E2=80=94 Epitaph of a volunteer

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: hardlink util -- files de-duplication
  2018-06-01 13:29 ` Dmitry V. Levin
@ 2018-06-12 10:55   ` Karel Zak
  2018-06-12 11:22     ` Ruediger Meier
  0 siblings, 1 reply; 11+ messages in thread
From: Karel Zak @ 2018-06-12 10:55 UTC (permalink / raw)
  To: Alexey Gladkov, util-linux

On Fri, Jun 01, 2018 at 04:29:30PM +0300, Dmitry V. Levin wrote:
> On Fri, Jun 01, 2018 at 01:38:07PM +0200, Karel Zak wrote:
> > For last 17 years in Red Hat based distros is available hardlink(1)
> > util, man hardlink:
> > 
> >    hardlink traverses one or more directories searching for duplicate
> >    files.  When it finds duplicate files, it uses one of them as the
> >    master.  It then removes all other duplicates and places a hardlink
> >    for each one pointing to the master file.  This allows for
> >    conservation of disk space where multiple directories on a single
> >    filesystem contain many  dupli‐ cate files.
> > 
> >    ...
> > 
> > the util is little bit orphaned, what about to add this util to
> > util-linux to make it available for another distros and keep it
> > maintained in serious way? ;-)
> > 
> >    https://src.fedoraproject.org/cgit/rpms/hardlink.git/
> > 
> > It's one .c file.
> > 
> > Comments & objections?
> 
> Better late than never.
> 
> BTW, our hardlink package has some Owl patches applied,
> please remind us to rebase and submit them. ;)

It seems there is no any strong objection against hardlink. So, I
think we can add it as *optional* (--enable-hardlink) to util-linux.
IMHO it's good idea to have such tool in basic Linux toolset.

The long term goal should be to add another new features
to make it more attractive to users who have to use another
alternatives now :-)


I won't have enough time in next two weeks to work on this task (fix
indention, reuse some lib/ stuff, etc.), so any volunteer(s)? ;-)

    Karel


-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: hardlink util -- files de-duplication
  2018-06-12 10:55   ` Karel Zak
@ 2018-06-12 11:22     ` Ruediger Meier
  2018-06-12 12:12       ` Karel Zak
  0 siblings, 1 reply; 11+ messages in thread
From: Ruediger Meier @ 2018-06-12 11:22 UTC (permalink / raw)
  To: Karel Zak; +Cc: Alexey Gladkov, util-linux

On Tuesday 12 June 2018, Karel Zak wrote:
> I won't have enough time in next two weeks to work on this task (fix
> indention, reuse some lib/ stuff, etc.), so any volunteer(s)? ;-)

I would do this, maybe next week or so. Should it go to sys-utils or  
misc-utils?

cu,
Rudi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: hardlink util -- files de-duplication
  2018-06-12 11:22     ` Ruediger Meier
@ 2018-06-12 12:12       ` Karel Zak
  0 siblings, 0 replies; 11+ messages in thread
From: Karel Zak @ 2018-06-12 12:12 UTC (permalink / raw)
  To: Ruediger Meier; +Cc: Alexey Gladkov, util-linux

On Tue, Jun 12, 2018 at 01:22:45PM +0200, Ruediger Meier wrote:
> On Tuesday 12 June 2018, Karel Zak wrote:
> > I won't have enough time in next two weeks to work on this task (fix
> > indention, reuse some lib/ stuff, etc.), so any volunteer(s)? ;-)
> 
> I would do this, maybe next week or so. Should it go to sys-utils or  
> misc-utils?

Thanks! I think misc-utils is better in this case. 

The sys-utils directory should be used for kernel API wrappers (ioctl,
syscalls, sysfs, etc) -- hmm... why we have kill.c in misc-utils? :-)

(But it's "color of the bikeshed" topic, so better not start this
discussion :-)

Note that the best way is to use "indent --linux" for the first patch.
And another changes to the code do by additional patches. So, we will
able to keep track about our local changes.

    Karel


-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-06-12 12:12 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-01 11:38 hardlink util -- files de-duplication Karel Zak
2018-06-01 13:08 ` Ruediger Meier
2018-06-01 20:20   ` Kevin Fenzi
2018-06-02  0:00     ` Francisco J. Tsao Santin
2018-06-01 13:25 ` Aurélien Aptel
2018-06-01 13:45   ` Samuel Thibault
2018-06-06 12:02   ` Carlos Santos
2018-06-01 13:29 ` Dmitry V. Levin
2018-06-12 10:55   ` Karel Zak
2018-06-12 11:22     ` Ruediger Meier
2018-06-12 12:12       ` Karel Zak

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.