Galaxy Community Hub

With Table Compute you can:

  • drop or duplicate given rows or columns from a tabular dataset
  • filter rows or columns on their properties
  • replace or transform table elements
  • compute tables of summary statistics across rows or columns

It is even possible to manipulate individual tables or combine multiple ones using pandas/numpy-based custom expressions!

In short, whatever functionality for manipulating tabular data you may have missed in Galaxy is likely provided by this tool, so give it a try in your next analysis. We are looking forward to your feedback!

Some ideas what you may use the tool for include

  • transpose a tabular dataset
  • log-transform the data, but add a small offset to zero-values first
  • subtract a table from another
  • normalize rows or columns

To get you started, here are some concrete step-by-step examples taken directly from the help section of the tool:

Example 1: Sub-selecting from a table

We have the following table:

c1c2c3
g1102030
g2369
g34812
g48163
{: .table}

and we want to duplicate c1 and remove c2. Also select g1 to g3 and add g2 at the end as well. This would result in the output table:

c1c1c3
g1101030
g2339
g34412
g2339
{: .table}

In Galaxy we would select the following:

  • Input Single or Multiple TablesSingle Table
  • Column names on first row?Yes
  • Row names on first column?Yes
  • Type of table operationDrop, keep or duplicate rows and columns
    • List of columns to select1,1,3
    • List of rows to select1:3,2
    • Keep duplicate columnsYes
    • Keep duplicate rowsYes

Example 2: Filter for rows with row sums less than 50

We have the following table:

c1c2c3
g1102030
g2369
g34812
g48163
{: .table}

and we want:

c1c2c3
g2369
g34812
{: .table}

In Galaxy we would select the following:

  • Input Single or Multiple TablesSingle Table
  • Column names on first row?Yes
  • Row names on first column?Yes
  • Type of table operationFilter rows or columns by their properties
    • FilterRows
    • Filter CriterionResult of function applied to columns/rows
      • Keep column/row if its observedSum
      • is< (Less Than)
      • this value50

Example 3: Count the number of values per row smaller than a specified value

We have the| following table:

c1c2c3
g1102030
g2369
g34812
g48163
{: .table}

and we want to count how many elements in each row are smaller than 10, i.e., we want to obtain the following results table:

vec
g10
g23
g32
g42
{: .table}

In Galaxy we would select the following:

  • Input Single or Multiple TablesSingle Table
  • Column names on first row?Yes
  • Row names on first column?Yes
  • Type of table operationManipulate selected table elements
    • Operation to performCustom
      • Custom Expression on ‘elem’elem < 10
    • Operate on elementsAll

Note: There are actually simpler ways to achieve our purpose, but here we are demonstrating the use of a custom expression.

After executing, we would then be presented with a table like so:

c1c2c3
g1FalseFalseFalse
g2TrueTrueTrue
g3TrueTrueFalse
g4FalseTrueTrue
{: .table}

To get to our desired table, we would then process this table with the tool again:

  • Input Single or Multiple TablesSingle Table
  • Column names on first row?Yes
  • Row names on first column?Yes
  • Type of table operationCompute Expression across Rows or Columns
    • CalculateSum
    • For eachRow

Executing this will sum all the ‘True’ values in each row. Note that the values must have no extra whitespace in them for this to work (e.g. ‘True ’ or ’ True’ will not be parsed correctly).

Example 4: Perform a scaled log-transformation conditionally

We want to perform a scaled log transformation on all values greater than 5, and set all other values to 1.

We have the following table:

c1c2c3
g102030
g2309
g3480
g48100
{: .table}

and we want:

c1c2c3
g11.000000000.14978660.1133732
g21.000000001.00000000.2441361
g31.000000000.25993021.0000000
g40.054252461.00000001.0000000
{: .table}

In Galaxy we would select the following:

  • Input Single or Multiple TablesSingle Table
  • Column names on first row?Yes
  • Row names on first column?Yes
  • Type of table operationManipulate selected table elements
    • Operation to performCustom
      • Custom Expression → :

        (math.log(elem) / elem) if (elem > 5) else 1
      • Operate on elementsAll

Example 5: Perform a Full table operation

We have the following table:

c1c2c3
g1102030
g23109
g34810
g4811010
{: .table}

and we want to subtract from each column the mean of that column divided by the standard deviation of it to yield:

c1c2c3
g19.35173717.78435328.550737
g22.3517377.7843537.550737
g33.3517375.7843538.550737
g480.3517377.7843538.550737
{: .table}

In Galaxy we would select the following:

  • Input Single or Multiple TablesSingle Table
  • Column names on first row?Yes
  • Row names on first column?Yes
  • Type of table operationPerform a Full Table Operation
    • OperationCustom

    • Custom Expression on ‘table’ along axis (0 or 1) → :

      table - table.mean(0)/table.std(0)

Example 6: Perform operations on multiple tables

We have the following three input tables:

Table 1

c1c2c3
g1102030
g23109
g34810
{: .table}

Table 2

c1c2
g112
g234
g365
{: .table}

Table 3

c1c2c3
g1123
g2123
{: .table}

Note that the dimensions of these tables do not match.

Dimensions: : - Table1 [3,3] - Table2 [3,2] - Table3 [2,3]

In order to perform simple operations between Tables, they must be of the same dimensions.

To add Table2 to Table3 we would have to transpose one of the tables using the in-built T method:

table2 + table3.T

or:

table2.T + table3

We can also perform more general operations using all 3 tables, such as taking the minimum value of the maximum values of Table2 and Table3, and dividing the Table1 values by it:

table1 / min(table2.values.max(), table3.values.max())

To perform these types of operations in Galaxy we would select the following:

  • Input Single or Multiple TablesMultiple Tables

  • (For each inserted table) : - Column names on first row?Yes - Row names on first column?Yes

  • Custom Expression → :

    <insert your desired function>

Please note that the last example shown above was chosen to illustrate the limitations of the tool. Nested attributes like table2.values.max are disallowed in expressions in the tool so the above would have to be replaced with the harder to read workaround:

table1 / min(np.max(np.max(table2)), np.max(np.max(table3)))

Also note that, currently min(), max() and sum() are the only built-in Python functions that can be used inside expressions. If you want to use additional functions, these have to be qualified functions from the math, np or pd libraries.

Example 7: Melt

We have the following table

ABC
0aB1
1bB3
2cB5
{: .table}

and we want:

Avariablevalue
0aBB
1bBB
2cBB
3aC1
4bC3
5cC5
{: .table}

In Galaxy we would select the following:

  • Input Single or Multiple TablesSingle Table
  • Column names on first row?Yes
  • Row names on first column?Yes
  • Type of table operationPerform a Full Table Operation
    • OperationMelt
    • Variable IDs → “A”
    • Unpivoted IDs → “B,C”

This converts the “B” and “C” columns into variables.

Example 8: Pivot

We have the following table

foobarbazzoo
0oneA1x
1oneB2y
2oneC3z
3twoA4q
4twoB5w
5twoC6t
{: .table}