Backup and Disaster Recovery solution at Koningin Elisabeth Instituut (KEI)
Backup solutions have changed greatly over the years but true conceptual changes have been rare. Using traditional infrastructure for backup strategies was common practice. It included complicated implementations and high maintenance. This led to difficult recovery and unreliable backups. Veeam software brought many advantages and it’s a great solution for many customers, but it’s still hard to scale, complicated and therefore, it generally has a slow recovery for entire virtual machines.
And then came Rubrik
Rubrik on the other hand, has all the advantages of a hyper-converged infrastructure. It consists of nodes which form a cluster and provide data protection and resiliency with one management interface for all components. This simplifies the infrastructure and maintenance immensely and subsequently improves the ROI. In July, Gartner recognized Rubrik as a visionary in the Magic Quadrant for data center backup and disaster recovery solutions. Rubrik is the only new vendor in this Magic Quadrant since 2014.
The most important characteristic of a backup and disaster recovery (DR) solution
What is the most important characteristic of a backup and a disaster recovery (DR) solution for businesses? The answer is a recent point objective (RPO) which is reliable and preferably does not affect the end-user or the environment. For disaster recovery, this is a quick recovery time objective (RTO), preferably with a simple restore-procedure to follow when a disaster occurs.
Rubrik delivers this, and we have been able to provide these backup and DR characteristics (and much more) to one of our customers, a rehabilitation treatment center called ‘Koningin Elisabeth Instituut’ (KEI).
Rubrik in practice
Koningin Elisabeth Ziekenhuis (KEI) is situated in the coastal municipality of East Dunkirk. KEI provides an IT-infrastructure to more than 300 employees. Using SecureLink’s Secure Workspace, they deliver virtual desktops to the end-users. Want to know how SecureLink improved their IT environment? Check out the reference case. In order to deliver reliable services, a backup and DR solution was essential.
Rubrik’s architecture: how does it work?
Rubrik uses a distributed file system called Atlas, which has a distributed metadata store: Callisto. The metadata is always held on the SSD of each node.
Image courtesy of Rubrik
More information about the architecture can be found here:
- A detailed presentation about the Atlas file system: https://vimeo.com/192017460
- A white paper (Rubrik) about data integrity: http://pages.rubrik.com/DataIntegritywithRubrik_Registration.html
The architecture provides security in different ways. The communication between the nodes is FIPS level 1, encrypted with TLS containing self-signed certificates or X.509 certificated signed by a certificate authority. For the 3xx series, internal software-based encryption is supported through AES-256 encryption. The Key Encryption Keys (KEKs) are stored on the internal Trusted Platform Module (TPM) chip or on an external KMIP server. Self-encrypting drives are available on the 5xx series.
Remark: the Edge virtual appliance which is used as a Remote Office/Branch Office (ROBO)-solution, has the same architecture and functionality but it does not support data encryption at rest.
White label hardware
Physically one Rubrik Brik only takes up 2U in the rack and can consist of maximum four nodes. White label hardware is used and no specialized or proprietary hardware components are required.
Image courtesy of Rubrik
These nodes create a cluster consisting of minimum three nodes, as is the case for KEI. If additional nodes were to be added, the performance and storage would scale linearly. This way KEI can quickly add one additional node to satisfy their growing backup needs. Increasing from three to four nodes would also automatically enable deduplication for all new data.
At KEI, the physical installation itself only required mounting the Rubrik in the rack, connecting one or two 10GbE SFP+ sockets on each node and connecting the IPMI ports. We used both 10GbE SFP+ sockets, which Rubrik uses in active/passive bond.
For the system setup, mDNS is used for node-to-node discovery and configuration. Using a web user interface, the cluster is configured with: cluster name, NTP, DNS, subnets, gateways and IP’s. During the cluster setup, data encryption at rest can be enabled. Be advised: this cannot be enabled or disabled after the initial setup.
Of course, the required ports must be opened as described in the user guide. The cluster does not require internet access but this does improve support capabilities with an optional remote support tunnel.
Image courtesy of Rubrik
Configuring the cluster
Rubrik automatically creates the cluster, consisting of three nodes in this case, after which the HTML 5 user interface becomes available. The web UI is provided randomly by one of the nodes in the cluster but every node IP can be used to visit it using HTTPS.
The management interface consists of a simple HTML 5 interface which uses Rubrik’s own rest API. The first step in configuring the cluster is adding a source: vCenter, AHV or Hyper-V. Multiple sources can be selected. In this case, one vCenter was added after which Rubrik loaded the entire VM inventory. The vCenter metadata is refreshed every 30 minutes and a full refresh, including the VMDK files, occurs every 2 hours. A full refresh can be manually initiated.
To take full advantage of the Rubrik features, service credentials of the VMs must be added. These do not have to be added individually for every VM, but Rubrik rather tries each provided service credential to gain access to the guest OS. A domain administrator account could be used but it is preferred to use one with specific administrator privileges for the relevant VMs.
It’s important to add these credentials before the first backup of the VM. This is to make sure the cluster knows it has the credentials which simplifies restoring files and folders when needed.
The first backup taken of a VM is a full backup. All following backups are always incremental. Currently, no network throttling is available, yet. This could stress the network if the first backups are not being monitored. If stress would get too high, assigning an initial backup window could solve this. After this, Rubrik always uses incremental backup based on the changed blocks on the host before data transfer, minimizing the network saturation.
There are different backup capabilities depending on the used operating system and environment. Using vCenter as a source, Rubrik uses VMware tools to take the backup.
After this quick installation and configuration, the backing up of VMs can begin. Rubrik uses SLA Domains to which VMs or other objects can be assigned. This simplifies the management while keeping flexibility and customization. By using a backup frequency instead of a specific time, Rubrik can intelligently decide upon the scheduling of backups. A window can be configured to specify the first full backup or the following incremental backups.
Image courtesy of Rubrik
By default, three SLA Domains are present: Gold, Silver and Bronze. These can be customized and tailored SLA Domains can be created. Each SLA Domain has specific policies assigned:
- Backup frequency and retention period
- Replication to another cluster (physical or cloud)
- Archiving to the cloud (Amazon S3 or Azure), NFS or Tape
- When archiving and replicating, security is ensured by using AES-256 encryption
When another Rubrik Cluster is present for replication, the SLA Domains on that second cluster are listed on the local cluster as ‘Remote SLA Domains’.
SLA Domains can be assigned to specific objects or vCenter folders, host or cluster.
It’s important to know that in case the retention period of an SLA Domain is increased, it not only affects future backups but also the backups that were previously taken. A different approach and mindset are needed when discussing a backup strategy using Rubrik.
Reports, analytics, and search
Using email reporting, Rubrik can be used without being constantly worried about the reliability of the backups. The reports can list SLA Domain compliance, storage usage, amount of data transferred, daily growth of the backup data, etc.
Combined with graphs of the storage usage over time and an estimated runway of used storage, this ensures the backup admin has time to react before the cluster has an issue. Adding a node or configuring archiving can be done long before storage is full.
Using the Atlas filesystem, predictive search can be used for all names of the backed-up objects. This is convenient for management and in case of an emergency, there is no need to browse and try to find a certain VM in a long list.
Live Mount offers the ability to quickly spin up a backup as a VM. This can be done in a few minutes thanks to the performance of the hyperconverged infrastructure. Live Mount enables a user to select a point-in-time of a virtual machine and use Rubrik as a data store while choosing the desired host. They can choose the virtual machine that has to be powered on or off and the state of the NIC can be selected too. This new VM will appear in vCenter. If desired, the data store can be changed from Rubrik to the desired host.
When a cryptolocker would encrypt i.e. a file server, the administrator can bring the server back online in a matter of minutes using the Live Mount feature. Rubrik’s filesystem is immutable which makes all backups secure against any cryptolocker. There is no risk of any backup being lost, even when using Live Mount. This gives the administrator great confidence about the recovery capabilities, even when facing a cryptolocker.
Many targets are supported, including but not limited to:
- ESXi 5.1, 5.5, 6.0, 6.5
- Hyper-V 2008 R2 with connectors, 2016 natively with WMI and RCT
- Nutanix AHV 5.1 or higher
- Physical/Virtualized Linux RHEL 5/6/7, CentOS 5/6/7, Oracle Linux 5/6/7, and SUSE 11 SP4
- Physical/Virtualized Windows 2008 R2, Windows 2012 and 2012 R2, Windows 2016
- SQL DB’s with transaction log backup. Starting from Rubrik 4.0, named Alta, Live Mount for SQL DB’s has been added.
- Oracle RMAN (Oracle Database 12c R1 (12.1.0), 11g R2 (11.2.0), and 10g R2 -ASM & RAC Supported)
- Application-aware backup and recovery is available through Microsoft VSS integration for Microsoft Windows 2012/2008 R2, Microsoft Exchange Server 2010/2013, Microsoft SharePoint 2013, Microsoft SQL Server 2008/2008 R2/2012/2014, Microsoft Active Directory in Windows Server 2012/2008 R2
Rubrik provides an easily manageable yet feature-rich backup solution, many features were not covered here. It takes backup and disaster recovery to a new level with features we did not associate it with before. We look forward to the next Rubrik release to provide an even better solution to our customers.