Sampling Data for DocumentDB

You can use the options in the Sample View to specify how the connector samples DocumentDB data in order to generate a schema definition. The connector samples the data in order to detect its structure and determine the data mappings that best support the data.

When sampling your DocumentDB data to create a schema definition, the Schema Editor automatically identifies columns, assigns data types, and creates virtual tables for complex data types. To further refine your schema, you can use the following methods:

  • Collection Mapping (default): The connector samples the data source and creates a single table schema.
  • Table Delimiters: You can specify one or more attributes from the data, and the connector creates separate tables for each unique value of those selected attributes. Optionally, you can choose specific values for each attribute, and tables are only created for that subset of values. With this option, any unspecified values are excluded from the schema.
  • Note:

    Collection Mapping and Table Delimiters are mutually exclusive. You can only sample a collection using one of these methods.

  • Views: In addition to the two above methods, you can specify one or more views and the connector represents each as a separate table in the schema. A view is defined using the query language of the data source. For example:

    SELECT c.Name

    FROM AddressBook a

    JOIN c IN a.children

    WHERE a.id = '123'

    ORDER BY a.address.city ASC

Important:

If you are using the 32-bit Schema Editor to work with deeply-nested structures or thousands of attributes, you may encounter a Java heap space error. In this case you may need to specify larger initial and maximum values for the memory allocation pool used by the Schema Editor. To do this, launch the Schema Editor from the command line using the following command:

java -jar SchemaEditor.jar -Xms[initial]m -Xmxmaximumm

where:

  • [initial] is the initial size, in megabytes, of the memory pool
  • [maximum] is the maximum size, in megabytes, of the memory pool

For example, to specify an initial memory pool of 512 megabytes and a maximum memory pool of 2048 megabytes, use the following command:

java -jar SchemaEditor.jar -Xms512m -Xmx2048m

To sample data using Collection Mapping:

  1. Launch the Schema Editor.
  2. Select your DSN from the list, or enter an appropriate connection string.
  3. Click Connect.
  4. Under Create A New Schema Definition, click Create New.
    Schema Editor shows a list of databases and collections from the data source.
  5. Select the collection or collections you wish to sample from the list.
  6. Click Sample.

To sample data using Table Delimiters:

  1. Launch the Schema Editor.
  2. Select your DSN from the list, or enter an appropriate connection string.
  3. Click Connect.
  4. Under Create A New Schema Definition, click Create New. Schema Editor shows a list of databases and collections from the data source.
  5. Next to the collection you wish to sample, click Edit under the Mapping Method column.
  6. Set the Mapping Method to Table Delimiters.
  7. In the Attributes field, enter the Attribute you wish to use as a delimiter and press Enter. Repeat this process to add all relevant attributes.
  8. You can delete attributes by selecting them and clicking DeleteDelete icon, or Delete AllDelete All icon to remove all attributes.

    Note:

    If you do not specify values for the selected attribute or attributes, the Schema Editor creates tables for all unique values of the specified attributes, requiring it to survey the entire data source. For larger data sources this can take a significant amount of time.

  9. In the Values field, enter the unique value you want used in your schema and press Enter. Repeat this process to add all relevant values. Any values that are not specified, and data related to them, will be excluded from the schema.
    You can delete values by selecting them and clicking Delete Delete icon, or Delete All Delete All icon to remove all values.
  10. Click OK to return to the Sample dialog box.
  11. To save the information entered here so it can be reused with another data source with the same structure, click Save.
  12. Click Sample.

To sample data using View definitions:

  1. Launch the Schema Editor.
  2. Select your DSN from the list, or enter an appropriate connection string.
  3. Click Connect.
  4. Under Create A New Schema Definition, click Create New.
    Schema Editor shows a list of databases and collections from the data source.
  5. Select the collection or collections you wish to sample from the list. Set up any Table Delimiters you want to use according to the previous procedure.
  6. To add a new view definition, do the following:
    1. Under the View Definitions column for the appropriate collection, click Add.
    2. Click New.
    3. Type a unique name for the View Definition, then click OK.
    4. In the Edit View field, type a query that retrieves the information you want to view.
    5. To add an additional view, click Add.
      To remove views from the list, use Delete.
    6. To return to the Sample dialog box, click OK.
  7. Click Sample.

The connector samples the data as specified and generates a schema definition, which opens in the Design View in the Schema Editor.