Lighthouse      Zap's Digital Lighthouse
   


About
Zap's Digital Lighthouse is
a Blosxom weblog for our digital outpost on the Internet

For info
info@rax.org


Useful links:
Google
Cyberpresse
The Reg
Slashdot
FreeBSD
LinkedIn
Twitter
Boursorama
RAX
zap
Soekris
xkcd
AirFrance
Wiki soekris
Wikipedia
Wiktionary
ACME
blosxom

Categories:
/FreeBSD (27)
/admin (3)
/blosxom (6)
/games (3)
/hardware (17)
/inet (4)
/misc (37)
/notwork (2)
/software (11)
/tech (1)

Archives:
 2023 (1)   
 | June (1)
 2021 (2)   
 | January (2)
 2020 (2)   
 | December (1)
 | September (1)
 2019 (2)   
 | November (1)
 | July (1)
 2018 (6)   
 | December (1)
 | November (3)
 | January (2)
 2017 (4)   
 | December (2)
 | January (2)
 2016 (3)   
 | November (1)
 | October (1)
 | January (1)
 2015 (9)   
 | December (2)
 | November (1)
 | October (1)
 | June (1)
 | May (2)
 | February (1)
 | January (1)
 2014 (9)   
 | December (1)
 | October (1)
 | September (1)
 | August (3)
 | May (2)
 | April (1)
 2013 (20)   
 | October (3)
 | June (4)
 | May (2)
 | April (7)
 | March (1)
 | January (3)
 2012 (60)   
 | December (4)
 | October (1)
 | July (5)
 | June (7)
 | May (1)
 | April (6)
 | March (3)
 | February (14)
 | January (19)
 2011 (3)   
 | December (1)
 | November (2)
 2008 (1)   
 | October (1)


Blosxom

       

home :: FreeBSD :: stow01

Sat, 17 Mar 2012

Content Addressable Storage

Because I am backing up a whole bunch of machines that may have identical files on them (many copies of Windows, old backups, etc), I am interested in finding ways of having n copies of a file not occupy n times the amount of storage.

I see three ways of doing this:

  1. Using ZFS file depuplication on my backup server. This sounds good, but all of the ZFS configuration guides I've seen recommend having lots of RAM when you use dedup, and my little backup servers have only 2 to 4 GB of RAM, so this seems like a non-starter.
  2. Using a manual dedup script that uses hardlinks to free up diskspace on a Unix volume... Something like dedup from Roderick Schertler (documented here), trimstrees at CPAN, harlink from Julian Andres Klode, opendedup, or fileuniq on sourceforge (there is a list here) actually. That sounds attractive, but I am not sure how long it would take on 1 TB of data, and I am not sure hardlinks will work well on files owned by multiple users.
  3. Then, there is the possibility of Content Addressable Storage. EMC does this on its Centera product line, and it is quite useful in the Corporate space... in the Open Source world, I have been interested by Poul-Henning Kamp's stow package for a while (see here for the old version or here for the new version that phk is working on this year).

Stow looks interesting... I'll try to find some time to dig further into it.

/FreeBSD | Posted at 11:48 | permanent link