Bacula Sizing and Components Distribution
As a distributed system, there are several arrangements that can be made for Bacula deploy. For Small environments, a single machine can probably host all Bacula components: the Director, the Storage Daemon, the File Daemon (client to backup itself), the Catalog database and the web interfaces, such as bweb (Enterprise) and Baculum (Community). For larger environments, the Storage Demons might be installed on different machines, providing load balancing to the backup workload.
The Bacula components distribution suggestion is given as follows:
|# of Clients||25||50||200||500||2000||5000|
|Director + Catalog + Storage Daemon Machine||1||1||1||1||1||1|
|Storage Daemon Extra Machines||0||0||0||1||3||9|
Bacula operations should always take place on 64 bits operating systems, which can be virtualized. However, the backup server and clients machines required resources are very dependent on Bacula used features, such as block-level deduplication, encryption, compression, backup size, number of clients and of simultaneous backup jobs. Some resource sizing considerations are provided as follows.
The Bacula Director daemon does not require many resources and has been tested with more than 10,000 attached backup clients. It hosts the backup configurations, provides daemons authentication and job scheduling. That's why the Director is usually hosted together with the web interface, Catalog, and even Storage Daemons. The backup Clients are always necessary since it provides the system self-backup.
The Bacula backup metadata Catalog should be preferably PostgreSQL. The database size is largely dependent on the number of backed up files and backup retention times since each file is an entry on the database (250 Bytes in average). A CentOS 7 or Windows Server 2016 has an average 140,000 files. If we consider 200,000 for system growing, file-spamming applications existence and differential backup execution purposes, each fully backed up operating system will occupy 50 MB of database size per full backup job times the number of full backups that are retained by your backup policy.
50MB * Number_full_machines * Number_full_retained_jobs = Required_DB_disk_space
For example, a full backup of 25 operating systems with 4 weekly and 12 monthly retained full backups, would require 20GB of database space.
Of course, the requirements are much lower if only applications data and databases are backed up.
The Catalog and OS partition disks can be hosted on LVM and will have great advantage if using disks with fast I/O such as SSD, especially for medium and larger Bacula systems (e.g. more than 100 backup clients).
The SSD usage might also diminish the amount of required RAM memory since disk access is not that slower (and RAM is many times more expensive). If not, the total ideal physical RAM should be larger than your database size on disk in order to minimize I/O.
The suggested RAM and CPU for a machine that hosts the Director and the Catalog Database using a very fast, SSD or NVME disks is informed in the following table.
|# Backup Clients||25||50||200||500||2000||5000|
Bacula Storage Daemon (SD)
The Storage Daemon was tested serving up to 500-600 backup clients. It is dependent on CPU capacity and good network connection (Gigabit, bonded Gigabits or 10 Gbit) since it will receive data from the backup clients and will store in the NAS, Tape Library or any sort of backup storage. The connection capacity with these devices must be abundant too, such as SCSI, iSCSI, and FC.
XFS with LVM or ZFS are good candidates for file-system, that will probably host disk-based backups. Initially, the only space required by the SD is the operating system one, and the extra space for backup storage will probably be provided by extra mounted disks.
If using deduplication, however, an extra small very fast or SSD disk must be provided to host the dedup index files. The Global Deduplication backup speed will largely depend on that I/O. The suggested amount is 20 GB of SSD per 1 TB of backups.
As for the regular backup deduplicated data container disks, that can be slower, the RAID-6 (or greater) usage is especially recommendable since a single information block serves to multiple backups and should not be lost in any way. Its required size relies on the deduplicability of your data, how much it changes and backup retention times, but is safer to say Bacula deduplication is as good or better than any other backup software with deduplication.
Also, the amount of RAM is only important if using the deduplication. In a free resources world, it should be at least the same size as the dedup index files. Since this is not always possible, try to have 1.3 GB of RAM for each 1 TB backup. If this is still not possible, use the fastest SSD or NVME disks available to host the dedup index files as an acceptable trade-off.
The suggested resources for a machine with the Bacula Storage Daemon and Deduplication, are informed as follows:
|Total Backup Size TB||10||20||50||100||300||500|
|SSD Disk for Dedup Index TB||0.2||0.4||1||2||6||10|
|Slower Disk for Dedup Container TB||15||30||50||150||450||750|
Bacula File Daemon
The Bacula client uses minimal CPU and Memory for standard backups.
If using Global Deduplication, Encryption and/or Accurate mode feature (required by VSphere, Oracle, PGSQL PITR and other plugins), extra resources are suggested.
The following extra resources are recommended for the Client machines:
|Client Backup Speed MB/s||100||400||1000|
|CPU Ghz||3 Ghz||12 Ghz||30 Ghz|