Sampling Data for Couchbase

You can use the options in the Sample View to specify how the connector samples data from Couchbase to generate a schema definition. The connector samples the data in order to detect its structure and determine the data mappings that best support the data.

Important:

Before generating a schema definition, make sure that primary indexes have been created for all of the buckets in your Couchbase database.

To sample data for Couchbase:

  1. Choose one:

    • From the start page, click Create New, provide your connection information, and then click Connect. For detailed information about how to specify connection information, Connecting to a Data Store.
    • The Sample dialog box opens.

    • Or, from the Design View, click the Sample View tab. If you are not already connected to a data store, the Schema Editor prompts you to provide your connection information. For detailed information about how to specify connection information, Connecting to a Data Store.
    • The Schema Editor displays the Sample View.

  2. In the Sampling Count field, type the maximum number of records that the connector can sample to generate the schema definition. To sample every record in the database, set this option to 0.

    Note:

    Typically, sampling a large number of records results in a schema definition that is more accurate and better able to represent all the data in the database. However, the sampling process might take longer than expected when many records are sampled, especially if the database contains complex, nested data structures.

  3. In the bottom pane, specify the collections that the connector samples records from by selecting the corresponding check boxes in the Selected column. You can select every collection in the database by selecting the check box in the Selected column header.

    Note:

    You can group and sort collections by clicking a column header. For example, to group collections based on the catalogs they belong to and then sort those groupings by catalog name in ascending order, click the Catalog column header. To sort the list in descending order, click the header again. To disable sorting, click the header a third time.

  4. In each field in the Type Name Filter column, type the name of the attribute that the bucket (or collection) uses to specify document types.
  5. When generating a schema definition, the connector creates a base table for each different document type. For example, if you specify type as the type name filter for a bucket, and that bucket contains documents that have the type values retail and internal, then the connector creates two tables named "retail" and "internal" in the schema definition.

    Important:

    Although specifying a type name filter is optional, it is highly recommended that you do so. If you do not provide a type name filter, then the connector uses the bucket name as the type for all the documents in the bucket, and the resulting schema definition is likely to contain errors.

    In addition, make sure that you specify the type attribute for every data bucket in the Couchbase Server instance or cluster to which you are connecting. Otherwise, you might encounter issues when working with document types that are not part of the schema definition.

  6. To generate the schema, click Sample.

The connector samples the data as specified and generates a schema definition, which opens in the Design View in the Schema Editor. If you return to the Sample View, you will see that the check boxes in the Sampled column are selected for all the columns that were included in the sampling process.