Creating a Data Source Name

Typically, after installing the Simba Apache Spark ODBC Connector, you need to create a Data Source Name (DSN). A DSN is a data structure that stores connection information so that it can be used by the connector to connect to Spark.

Alternatively, you can specify connection settings in a connection string or as connector-wide settings. Settings in the connection string take precedence over settings in the DSN, and settings in the DSN take precedence over connector-wide settings.

The following instructions describe how to create a DSN. For information about specifying settings in a connection string, see Using a Connection String. For information about connector-wide settings, see Configuring a DSN-less Connection.

To create a Data Source Name:

  1. From the Start menu, go to ODBC Data Sources.
  2. Note:

    Make sure to select the ODBC Data Source Administrator that has the same bitness as the client application that you are using to connect to Spark.

  3. In the ODBC Data Source Administrator, click the Drivers tab, and then scroll down as needed to confirm that the Simba Apache Spark ODBC Connector appears in the alphabetical list of ODBC drivers that are installed on your system.
  4. Choose one:
    • To create a DSN that only the user currently logged into Windows can use, click the User DSN tab.
    • Or, to create a DSN that all users who log into Windows can use, click the System DSN tab.

    Note:

    It is recommended that you create a System DSN instead of a User DSN. Some applications load the data using a different user account, and might not be able to detect User DSNs that are created under another user account.

  5. Click Add.
  6. In the Create New Data Source dialog box, select Simba Apache Spark ODBC Connector and then click Finish. The Simba Apache Spark ODBC ConnectorSpark DSN Setup dialog box opens.
  7. In the Data Source Name field, type a name for your DSN.
  8. Optionally, in the Description field, type relevant details about the DSN.
  1. From the Spark Server Type drop-down list, select the appropriate server type for the version of Spark that you are running:
    • If you are running Shark 0.8.1 or earlier, then select SharkServer.
    • If you are running Shark 0.9, or Spark 1.1 or later, then select SparkThriftServer.
  2. Specify whether the connector uses the DataStax AOSS service when connecting to Spark, and provide the necessary connection information:
    • To connect to Spark without using the DataStax AOSS service, do the following:
      1. From the Service Discovery Mode drop-down list, select No Service Discovery.
      2. In the Host(s) field, type the IP address or host name of the Spark server.
      3. In the Port field, type the number of the TCP port that the Spark server uses to listen for client connections.
    • Or, to discover Spark services via the DataStax AOSS service, do the following:
      1. From the Service Discovery Mode drop-down list, select AOSS.
      2. In the Host(s) field, type a comma-separated list of AOSS endpoints. Use the following format, where [AOSS_Endpoint] is the IP address or host name of the AOSS endpoint, and [AOSS_Port] is the number of the TCP port that the AOSS endpoint uses to listen for client connections:
      3. [AOSS_Endpoint1]:[AOSS_Port1],[AOSS_Endpoint2]:[AOSS_Port2]

      4. If the AOSS endpoints require different authentication and SSL settings than the Spark service that you are connecting to, click Service Discovery Options and configure the options in the AOSS Options dialog box as needed. For more information, see Configuring AOSS Options.
  3. In the Database field, type the name of the database schema to use when a schema is not explicitly specified in a query.
  4. Note:

    You can still issue queries on other schemas by explicitly specifying the schema in the query. To inspect your databases and determine the appropriate schema to use, type the show databases command at the Spark command prompt.

  5. In the Authentication area, configure authentication as needed. For more information, see Configuring Authentication.
  6. Note:

    Shark Server does not support authentication. Most default configurations of Spark Thrift Server require User Name authentication. To verify the authentication mechanism that you need to use for your connection, check the configuration of your Hadoop / Spark distribution. For more information, see Authentication Mechanisms.

  7. Optionally, if the operations against Spark are to be done on behalf of a user that is different than the authenticated user for the connection, type the name of the user to be delegated in the Delegation UID field.
  8. Note:

    This option is applicable only when connecting to a Spark Thrift Server instance that supports this feature.

  9. From the Thrift Transport drop-down list, select the transport protocol to use in the Thrift layer.
  10. Note:

    For information about how to determine which Thrift transport protocols your Spark server supports, see Authentication Mechanisms.

  11. If the Thrift Transport option is set to HTTP, then to configure HTTP options such as custom headers, click HTTP Options. For more information, see Configuring HTTP Options.
  12. To configure the connector to connect to Spark through a proxy server, click Proxy Options. For more information, see Configuring a Proxy Connection.
  13. To configure client-server verification over SSL, click SSL Options. For more information, see Configuring SSL Verification.
  14. Note:

    If you selected User Name or Windows Azure HDInsight Emulator as the authentication mechanism, SSL is not available.

  15. To configure advanced connector options, click Advanced Options. For more information, see Configuring Advanced Options.
  16. To configure server-side properties, click Advanced Options and then click Server Side Properties. For more information, see Configuring Server-Side Properties.
  17. To configure logging behavior for the connector, click Logging Options. For more information, see Configuring Logging Options.
  18. To test the connection, click Test. Review the results as needed, and then click OK.
  19. Note:

    If the connection fails, then confirm that the settings in the Simba Spark ODBC Driver DSN Setup dialog box are correct. Contact your Spark server administrator as needed.

  20. To save your settings and close the Simba Spark ODBC Driver DSN Setup dialog box, click OK.
  21. To close the ODBC Data Source Administrator, click OK.