Virtual Tables

One advantage of the Apache Cassandra design is the ability to store data that is denormalized into a fewer number of tables. By taking advantage of nested data structures such as sets, lists, and maps, transactions can be simplified. However, the JDBC interface does not natively support accessing this type of data. By renormalizing the data contained within collections (sets, lists, and maps) into virtual tables, the Simba Cassandra JDBC Connector allows users to directly interact with the data but leave the storage of the data in its denormalized form in Cassandra.

If a table contains any collection columns, when the table is queried for the first time, the connector creates the following virtual tables:

  • A base table, which contains the same data as the real table except for the collection columns.
  • A virtual table for each collection column, which expands the nested data.

Virtual tables refer to the data in the real table, enabling the connector to access the denormalized data. By querying the virtual tables, you can access the contents of Cassandra collections via JDBC. When you write or modify data in a virtual table, the data in the real table in the Cassandra database is updated.

The base table and virtual tables appear as additional tables in the list of tables that exist in the database. The base table uses the same name as the real table that it represents. The virtual tables that represent collections are named using the name of the real table, a separator (_ vt_ by default), and the name of the column.

For example, consider the following table. ExampleTable is a Cassandra database table that contains an integer primary key column named pk_int, a list column, a map column, and a set column (named StringSet).

pk_int List Map StringSet

1

["1", "2" , "3"]

{ "S1" : "a", "S2" : "b" }

{ "A", "B", "C" }

3

["100", "101", "102", "105"]

{ "S1" : "t" }

{ "A", "E" }

The connector would generate multiple virtual tables to represent this single table. The first virtual table is the base table, which is shown below.

pk_int

1

3

The base table contains the same data as the original database table except for the collections, which are omitted from this table and expanded in other virtual tables.

The following three tables show the virtual tables that renormalize the data from the List, Map, and StringSet columns.

pk_int List#index List#value

1

0

1

1

1

2

1

2

3

3

0

100

3

1

101

3

2

102

3

3

105

pk_int Map#key Map#value

1

S1

a

1

S2

b

3

S1

t

pk_int StringSet#value

1

A

1

B

1

C

3

A

3

E

The foreign key columns in the virtual tables reference the primary key columns in the real table, and indicate which real table row the virtual table row corresponds to. The columns with names that end with #index or #key indicate the position of the data within the original list or map. The columns with names that end with #value contain the expanded data from the collection.

The data in the virtual tables can be selected, inserted, and updated as if they were normal tables, and the connector handles the storage details within Cassandra. You can also explicitly append data to the end of a list by inserting a row of data with the index column set to -1.

For example, to append 106 to the List column in ExampleTable, where pk_int = 3, use the following query:

INSERT INTO "ExampleTable_vt_List" (pk_int, "List#index", "List#value") VALUES (3, -1, '106')