Galaxy CloudMan CapacityPlanning for Amazon Web Services
This page offers advice on how much cloud infrastructure you will need to run your Galaxy instance on Amazon Web Services (AWS). See the general capacity planning page for advice that applies across different cloud infrastructures.
. Amazon's Elastic Compute Cloud (EC2) provides the compute part of their cloud. How many CPUs, and how much memory any instance has is determined by that instance's EC2 Instance Type.
. Amazon's Elastic Block Storage (EBS) provides virtual disk drives for EC2 instances.
. Amazon's Simple Storage Service (s3) is "storage for the internet." It provides a web services interface to net-accessible storage. It is not used at runtime by Galaxy cloud instances, but can be used to create archives of EBS virtual disks.
Which EC2 instance type(s) should you use for your Galaxy?
|1: Light usage||Standard Large or Extra Large||Standard Large or Extra Large|
|2: Occasional heavy||High-Memory Double or Quadruple Extra Large||High-Memory Extra Large|
|3: Continuous variable||High-Memory Double or Quadruple Extra Large||High-Memory Extra Large|
|Instance Type||Recommended for Usage Scenarios||Comments|
|Micro||N||N||N||N||N||N||Galaxy may come up on these instances, but it can't run any analysis.|
|Large<td rowspan=2 style=" class="green" text-align:center;"> Y <td rowspan=2 style=" class="green" text-align:center;"> Y||N||N||N||N||Recommended for Scenario 1: Light Usage, head and worker nodes.|
|Extra Large||N<td style=" class="green" "> Y||N<td style=" class="green" "> Y||Recommended for Scenarios 2 & 3: heavy or variable usage head nodes.|
|Double Extra Large||<td style=" class="green" "> Y||<td style=" class="green" "> Y||Recommended head node for heavy/variable usage (Scenarios 2 & 3)|
|Quadruple Extra Large||<td style=" class="green" "> Y||<td style=" class="green" "> Y||The Galaxy Team uses this head node in workshops that run TopHat. It can support ~30 concurrent TopHat jobs without significant slowdown, whereas the Double Extra Large option gets bogged down.|
|Cluster Any||X||X||X||X||X||X||These are not supported by CloudMan|
|X||Can't go there|
- Reducing costs in Cloud Galaxy thread, Galaxy-Dev mailing list, started 2012/03/18.
- The CloudHarmony Benchmarks page
Galaxy CloudMan comes with two standard volumes:
- Tools Volume (10GB): Contains the tools used by the instance
- Indices Volume (700GB): Reference data for number of species.
In addition, you will need a data volume to contain the data used by and produced in your analysis. You don't control the size of the tools and indices volumes, but you specify the size of the data volume at setup time. The size of your data volume is determined by the size of your datasets. Unfortunately, we don't have any hard and fast guidelines or multipliers for how much you will need, given the size of your datasets.
For Scenario 1, Light usage, it is fine to specify a large data volume (up to the 1 terabyte max). However for Scenarios 2 and 3, where the storage may or will exist for a long time, allocating too much storage can incur significant cost. AWS charges for allocated storage, not actually used storage, by the hour.