Tiernan's Comms Closet

Geek, Programmer, Photographer, network egineer…

Currently Viewing Posts in Storage

ZFS over multiple DVD/BD-R images

A couple of days back, I started thinking about archiving and backup software. I kind of have backups “sorted”, with my MacBook Pro using BackBlaze to backup to the cloud, Time Machine backing it up to my Synology, my VMs on Proxmox being backed up to Proxmox Backup Server off-site, my Synology and QNAPs being backup to B2 and Hetzner and some other bits and bobs… But for the Archiving stuff, I am not really set up… So, I went looking for archiving software. Couldn’t find anything, so asked on r/DataHoarder. Still no options, at the time of posting, but someone did reply with the idea of using DVDs (or Blu Rays) for ZFS...

Ok, that’s just crazy, but in a kind of a good way… kind of like the floppy RAID stuff I have seen… It does help with the storage of data, plus allows for potential loss of data… but it needs some automation to get it fully perfect…

Assuming you are using this for archiving, you could automate building 5 ISOs, just shy of 100Gb each, once a month, create ZFS ZRAID 2 or 3 (depending on how paranoid you are) and then write your data to it. ZRAID lets you lose 1 disk, giving you around 400GB of usage. Z2 brings that up to 2 losable disks, and 300Gb and Z3 is 3 disks and 200GB. I think Z2 would be your best bet, especially if you are using something like MDisk and are storing them safely.

Once finished, unmount and send an email saying you need to write the ISOs to disk. Label each disk with a unique serial number (this is where the archiving software would be handy) plus the set details and number (so, March 2023 Disk 1/5).

If you need something from that backup you stick it in the drives… You can do it with multiple drives, so with 5 disks and ZRAID, you need to mount a minimum of 4 of them. ZRAID2 needs 3 and ZRAID3 needs a minimum of 2… Ideally, you would want 5 of them, allowing you to check all disks (ZFS Scrub) and then get your files off.

A year of archiving would require 5 drives (say 100 quid a pop, USB makes things easier... Internal is possibly cheaper) and 60 disks (I Found 25 100Gb MDisks disks on Amazon for around 500 EUR) costing a total of maybe 2k, with 15 extra disks…

Follow-up questions:

  • Does ZFS allow the mounting of read-only?
  • Could you do this with Rewritable BluRay disks? Could they be mounted directly and written to? Leave them in the drives for the month, let writes do their thing and then archive them once a month? It’s archived, so it doesn’t need to be fast…

Zerotier and Minio Followup

in a previous post, I talked about setting up a distributed S3 like data storage system using Minio and ZeroTier. Well, this week, the ZeroTier guys tweeted about this.

A few people then started asking questions, and looking for a follow up, so here it is…

First, a quick recap. I had 4 machines, all running Linux. Three of them were in 1 time zone (GMT+1) and one was in another (GMT). Looking at the Distributed Minio Quickstart Guide again, there is a mention of times being in sync… which is probably why this did not work as planned… and by “not work as planed”, I mean that Minio would crash, or not be responsive, or not write data in the place it should have… which was a pain. But looking at the documentation again, they do mention that Windows support is “experimental” which means, hopefully, some day it will be not so experimental, and might work… Given that most of my machines in house are Windows boxes, this would be a nice feature.

Now, what about ZeroTier? Given they posted it to their twitter? Well, it worked. it did the inter connect stuff well, and, given bandwidth limitations on a home broadband connection, it was still quite fast.

So, the question is, how fast? Well, on my Surface Book on a WiFi connection in the house, behind a Meraki MX64 firewall, connecting to the GodBoxV2 over FTP though ZeroTier, i get the following result:

the same download over FTP direct (no ZeroTier) does the following:

So, direct over FTP is faster… in this instance by about 70%, but, over the download, it did get slower (seen it hit 12 at one stage) and because its over WiFi, those are a bit wonky…

I did get one last screen shot:

as you can see, the Zerotier network adapter is showing 77.3Mbps, but the main network adapter is showing 80.8Mbps. There would be other traffic there, but if we assume there is nothing but ZeroTier traffic being sent, there is about 5% of an overhead.

So, to wrap up: Minio and its distributed storage system over ZeroTier needs more testing. Ideally, all hosts need to be in the same time zone, or at least have the same time… Will try work on that soon. As for ZeroTier? I am extremely happy with them. Its fast, easy to setup, and easy to configure. What more could you ask for? Oh, and free, unless you need a pro account!

Distributed S3 data storage using Minio (and Zerotier)

So, something i have been looking into in recient times has been Distributed Storage, and, more specifically, how to use the storage in my many, many machines to protect data, and also increese my usable space… There are a few projects on the market that do this (Ceph, NooBaa and Gluster all spring to mind) but some are more painful to setup than others… which brings me nicely to Minio. Minio is a 20ish MB executable you download from their site, mark it as executable (on Linux or Mac Boxes) and run… and you have yourself a S3 compatable storage server… Simples!

“But Wait!” i here you screem! “thats not distributed!”. Well, yes… but, it can be! Their Distributed Quick Start Guide, which is where i started with this, allows you to run a distributed copy of your data. I will let their documentation explain more, but this is what i did:

  • download the minio server (single executable file) on a minimum of 4 machines.
  • on each machine, run a command like the following:

replacing accesskey and secretkey with keys (check minio documentation to get these) and foldertoexport with, well, the folder you want to export!

For me, i have 4 servers currently clustered. 2 are in online.net (one in Paris, one in Amsterdam), 1 in OVH.NET (France, somewhere) and one in Dublin (GodBoxV2 currently). They are all interconnected using ZeroTier (I will explain that later) and so far, so good… only ran some basic tests, but with it, i could loose 2 machines and still have data… Not bad for free! I will run some speed tests soon.

ZFS Home storage pool

Over the weekend, my BTRFS pool for my /home directory on Linux failed… Not sure what happened, but it made me
do something i wanted to do for a while: Build a ZFS pool for my home dir.

First things first, the pool consists of 4 2Tb hard drives and 1 128Gb SSD. Its setup in RAIDZ1 (equivilent of RAID 5)
and then the SSD is set for caching.

To create the pool i ran

zpool create home raidz sda sde sdf sdg

then, to add the cache drive

zpool add home cache sdd

the pool (in my case) got mounted to /home, and then i restored my backup to it. to do some tests, i can the
following…

614MB/s write and 5.3GB a second read is nothing to be sniffed at! 🙂

Hubic and Duplicity

I mentioned HubiC in my last post, and in it i said that you could use Duplicity for backups. Well, this is how you get it to work…

First, i am using Ubuntu 14.04 (i think…). I use Ubuntu in house for a few things:

  • its running Tiernan’s Comms Closet, GeekPhotographer and Tiernan’s Podcast all in house, aswell as being used to build this site. The Web Server and MySQL Server are seperated, MySQL running on Windows, web on Ubuntu… but thats a different story…
  • I have a couple of proxy servers running Ubuntu also
  • Other general servers running Ubuntu… dont ask, cause i cant remember what they do half the time…

So, Duplicity is a backup application. From their website:

What is it?

Duplicity backs directories by producing encrypted tar-format volumes and uploading them to a remote or local file server. Because duplicity uses librsync, the incremental archives are space efficient and only record the parts of files that have changed since the last backup. Because duplicity uses GnuPG to encrypt and/or sign these archives, they will be safe from spying and/or modification by the server.

The duplicity package also includes the rdiffdir utility. Rdiffdir is an extension of librsync’s rdiff to directories—it can be used to produce signatures and deltas of directories as well as regular files. These signatures and deltas are in GNU tar format.

So, how do we get it working? Well, givin that i am on Ubuntu, these are the steps i needed to do:

  • first, we need some credentials and API keys… If you havent signed up for HubiC Do so now… That url gets you an extra 5Gb if you sign up for free (usually 25Gb) or if you pay 1EUR a month, you get 110Gb (usually 100Gb) and 5EUR a month gets you a staggering 10TB (yup! Terabytes!).
  • Login to Hubic, and in the menu go to ‘My Account’, ‘Developers’. in here, create a new application (name and URL to redirect to… http://localhost seems to work correctly). Get the Client ID and Secret ID that was given to you.
  • take the contents of the following gist and replace your own details… I know, i am not a fan of sticking my password in a txt file… but it should be your local machine…
  • that file should be in your home directory and should be called .hubic_credentials.
  • add the duplicity PPA project (https://launchpad.net/~duplicity-team/+archive/ubuntu/ppa) to ubuntu using the add-apt-repository command (details on the link above, under the link ‘read about installing’). for me, i just called ‘sudo add-apt-repository ppa:duplicity-team/ppa’
  • install duplicity by doing ‘sudo apt-get install duplicity’. Dont forget (its in the tutorial above!) to do an ‘sudo apt-get update’ first!
  • When i ran that, there where a few extra Python packages to be installed, so i was asked did i want to install them… Say, yes.
  • Now, to run a backup we run the following command:

duplicity ~/ cf+hubic://location

  • cf+hubic is the backend to use, ~/ is the url to backup (my home directory in this case) and location is where on Hubic we want it stored. If this doesent exist, not a problem… it will create it.
  • after we run this we… ahhh… i get an error:

BackendException: This backend requires the pyrax library available from Rackspace.

  • right… pyrax library is from Rackspace and is available to download though pip…
  • I seem to have python and a few other bits installed on this machine, so running ‘sudo pip install pyrax’ works… Your millage may vary… [eg, this is out of scope for this tutorial! your on your own!]
  • Other problem… I got a load of weird and wondering errors like this:

AttributeError: 'Module_six_moves_urllib_parse' object has no attribute 'SplitResult'

  • I fixed these by running:

sudo pip install furl --upgrade

  • FINALLY! ITS ALIVE!!! by default, it asks you for a key for the GnuPG encryption… and its all good! the first backup creates the directories, required files, etc. the next time you run the command, it will only upload changes. it will also ask for your GnuPG code you entered, so remember it!

And thats all folks! Any questions, leave them in the comments!

Hubic, OpenStack Swift and Curl

HubiC is an online storage site, built by the guys at OVH. They are currently offering 30Gb free (if you use the link above) or if you pay, you get 110Gb (insted of the usual 100Gb) for EUR1 a month, or 10.5TB (yup… TERABYTES!) for EUR5 a month… Thats a crazy amount of storage for a not crazy amount of money!

So, while playing around with different things, I found they have an API, so other than the usual apps to play with (like the Hubic Apps for iPhone, Android, Windows Phone, Windows Desktop and OSX, Duplicity for backing up *nix boxes, and a few others) you can build your own…

But first, i needed to figure out how… So, after a lot of arsing around in Linux shells with curl i finally got some stuff working!

First, i used the Hubic sandbox to get the keys… its quite simple to walk though… this gets you your Access Token (see step 3). next, we need to get the Endpoint from Hubic: This GIST shows more:

Quick walkthough:

the first CURL request is to the HubiC API to get the credentials… this gives you a JSON response with a token and a endpoint URL aswell with an expire time…

The next request gets you a list of all files (or at least a load of files in my case) of whats in your folder. the default name here is my folder… I think its what everyone starts out with in HubiC… if you remove it, you will see all your top level folders.

next request i tried was to upload a file… the filename part is where you want it to be stored. this must exist on your local machine.

finally, downloading of a file… pass in the location of the file on the server (listing files will give you the location) and then -o in curl shows the output location…

Simples! now to get this working in c#… Full OpenStack Swift API is available to show how to do more… hopefully it will help in my C# coding…

ZFS iSCSI NFS SFTP Hyper-V and more

As part of my new task to make my files safer and backups faster, and, well, cheap, I am looking into ZFS for my storage needs. My needs are as follows:

  • Allow me to store lots of different types of data (Photos, Videos, Music, VMs) in different formats (RAW and JPG photos, MP4, AVI and DivX Videos, with DVD and BluRay rips also a posibility, MP3 music and VHD files from HyperV, inclduing ISOs and Snapshots). I also need to store different file systems using iSCSI (Mac and Windows clients will be mounting the storage).
  • must be safe. DO NOT LOSE DATA!
  • must be somewhat fast. I have VHDs weighing in at 100Gb… my photo collection is 600Gb. If i need to move or copy files to the storage system, it must be fast.

So, ZFS offers all these features. I can export a file share as iSCSI, NFS, SMB, etc. All works well. But the replication stuff is the interesting part…

The plan, which i am working on, is as follows:

  • have 2 machines setup: one in house and one in a datacenter (I have a dedicated box in the Hetzner data center). both could be VMs (the one in the datacenter will more than likley be a VM).
  • use the storage on the local system for whatever i need backed up.
  • have a script which will take a snapshot of a given pool every 4 hours or so…
  • that script should also dump the snapshot to a temporary location on the machine using ZFS send.
  • that file should be checked, compressed, broken up into little bits and checked again… checking is important!
  • take those little bits and send them to the datacenter, which will do lots more checking and import the files into the ZFS pool over there…
  • there may even be a two way system to send from the datacenter back to the house…
  • finally, the remote pool should be dumped to an SFTP backup system that Hetzner give me… Currently set at 100Gb, but can be increesed as needed…

Thats the “plan”… Lets see how it actually works out…

Anyway, parts of the process i need to tweak:

  • uploading and using as much of my upload bandwidth as possible (2x10mb upload connections…) if i am backing up 800Gb, which should be my first backup, i would like to use both pipes to the fullest… on a single connection, at 50% capacity, it would take 15.1 days to upload. if i can get both connections to work at 80% capacity, giving me 16Mbits/s, it would be down to 4.7 days. With compression and Deduplication, i can probably bring that down a bit more…
  • backing up to SFTP… Reading different things is telling me this might not be such a good idea…

Some links which you might find useful:

Understanding Storage Spaces in Windows 8 and Windows Server 2012

So, Windows Server 2012 and Windows 8 have both RTMed in the last couple of weeks and will be available to the public in the next month or so (September for Server, October for Client). If you are an MSDN Subscriber, you already have Client, and will (hopefully) get server in the next couple of weeks… Fingers crossed… Anyway, one of the interesting features i am waiting for is Storage Spaces Tim Anderson’s Gadget Writing blog has some information on how Storage Spaces works. handy notes on what to do and what not to do.