Google BigQuery Data Model

Connection String Options

  1. Allow Large Result Sets
  2. Auto Cache
  3. Cache Connection
  4. Cache Driver
  5. Cache Location
  6. Cache Metadata
  7. Connect On Open
  8. Dataset Id
  9. Destination Table
  10. Firewall Password
  11. Firewall Port
  12. Firewall Server
  13. Firewall Type
  14. Firewall User
  15. Google Big Query Options
  16. Initiate OAuth
  17. Location
  18. Logfile
  19. Max Log File Size
  20. OAuth Access Token
  21. OAuth Client Id
  22. OAuth Client Secret
  23. OAuth JWT Cert
  24. OAuth JWT Cert Password
  25. OAuth JWT Cert Subject
  26. OAuth JWT Cert Type
  27. OAuth JWT Issuer
  28. OAuth JWT Subject
  29. OAuth Refresh Token
  30. OAuth Settings Location
  31. Offline
  32. Other
  33. Page Size
  34. Polling Interval
  35. Project Id
  36. Proxy Auth Scheme
  37. Proxy Auto Detect
  38. Proxy Password
  39. Proxy Port
  40. Proxy Server
  41. Proxy SSL Type
  42. Proxy User
  43. Pseudo Columns
  44. Query Cache
  45. Query Passthrough
  46. Readonly
  47. RTK
  48. SSL Server Cert
  49. Tables
  50. Temp Table Dataset
  51. Temp Table Expiration Time
  52. Timeout
  53. Verbosity

Allow Large Result Sets

Data Type

bool

Default Value

true

Remarks

Whether or not to allow large datasets to be stored in temporary tables for large datasets.

 

Auto Cache

Data Type

bool

Default Value

false

Remarks

When AutoCache is set, the driver automatically caches the results of SELECT queries to a cache specified by the CacheLocation option. CacheLocation defines the path to a simple, file-based cache.

AutoCache is the simplest caching configuration available, however, like any caching scheme, using a cache is not without pitfalls such as reporting on stale data. The driver is designed to be fully functional without relying on caching.

The following sections outline how and when to use AutoCache. Understanding how AutoCache works and its limitations will help you choose an effective caching strategy. For more information on deploying other caching strategies, see Caching Data.

How AutoCaching Works

 

When you execute a SELECT statement with AutoCache set, the driver executes the query to the remote data and persists the results; rows and columns that already exist are overwritten. That is, SELECT statements are used to create and refresh the cache, not to query it.

Non-queries (such as UPDATE/INSERT/DELETE statements) are executed to the remote data as well; these statements will not modify the data in the cache at all, regardless of the value set for AutoCache.

To work with the local data, append #CACHE to the table name. For example:

SELECT * FROM [publicdata:samples.github_nested#CACHE]
Limitations of AutoCache

 

In the following scenarios, consider the alternatives listed below:

  • When you need to work with the cache transparently: Because AutoCache requires a special syntax to utilize the cache, it is not suitable for use with BI, analytics, and reporting. Many of these tools generate SQL statements for you; these generated statements will still be executed to Google BigQuery instead of the cache.

    In these situations, one solution is to use the Offline property. When this is set, all queries are executed locally. See Caching: Best Practices for examples.

    One downside of this approach is that it requires a separate connection. As an alternative, consider using the CData Sync tool to maintain a local database that is kept fresh with scheduled updates.

     

  • When you need more control over cached data: The AutoCache feature does not have the ability to remove rows from the cache that were deleted from the remote data. It also does not support dropping a table from the cache or more advanced cache maintenance such as changing the cached table schemas.

    In this scenario, consider CACHE Statements. CACHE statements can remove cached rows that no longer exist in Google BigQuery.

    See Caching Explicitly for more information on how to use CACHE statements.

  • When you need to work with an RDBMS: AutoCache can only be used with the default database, JavaDB or SQLite. Many enterprises will need to use an RDBMS to support more concurrent writes or integrate with existing infrastructure. You can specify a database driver with CacheConnection and CacheDriver.

 

 

Cache Connection

Data Type

string

Default Value

""

Remarks

The cache database is determined based on the CacheDriver and CacheConnection properties. The CacheConnection defines the connection properties necessary to connect to the cache database.

Cache Driver=com.microsoft.sqlserver.jdbc.SQLServerDriver;Cache Connection='jdbc:sqlserver://localhost:7437;user=sa;password=123456;databaseName=Cache'

 

 

 

Cache Driver

Data Type

string

Default Value

""

Remarks

You can cache to any database that you have a JDBC driver for. The driver has been tested with SQL Server, Derby and Java DB, MySQL, Oracle, and SQLite.

The cache database is determined based on CacheDriver and the CacheConnection properties. The CacheDriver is the name of the JDBC driver class that you would like to use to cache data. The example below caches to SQL Server:

Cache Driver=com.microsoft.sqlserver.jdbc.SQLServerDriver;Cache Connection='jdbc:sqlserver://localhost:7437;user=sa;password=123456;databaseName=Cache'
Note that the driver JAR must be specified on the classpath.

 

 

Cache Location

Data Type

string

Default Value

""

Remarks

If AutoCache is set but the cache location is not specified, CacheLocation defaults to the cache folder on the directory specified by the Location setting.

The CacheLocation is a simple, file-based cache. See the CacheConnection and CacheDriver properties to cache to other databases.

 

Cache Metadata

Data Type

bool

Default Value

false

Remarks

The cache.db file will be created in the location specified by the CacheConnection or if that is not set, the CacheLocation.

 

Connect On Open

Data Type

bool

Default Value

true

Remarks

When set to 'true', a connection will be made to Google BigQuery when the connection is opened. This property enables the 'Test Connection' feature available in various database tools.

This feature acts as a NOOP command as it is used to verify a connection can be made to Google BigQuery and nothing from this initial connection is maintained.

Setting this property to 'false' may provide performance improvements (depending upon the number of times a connection is opened).

 

Dataset Id

Data Type

string

Default Value

""

Remarks

The DatasetId of the dataset you wish to connect to and view tables of.

 

Destination Table

Data Type

string

Default Value

""

Remarks

Queries that return large results will take longer to execute, even if the result set is small, and are subject to additional limitations:

  • A resultset destination table will be created.
  • You cannot specify a top-level ORDER BY clause.
  • Large data volumes cannot use the TOP function.
Note: The default write mode for this table is WRITE_TRUNCATE, so each query will drop the existing table and write the new result set. It is suggested each connection has a different DestinationTable specified.

 

 

Firewall Password

Data Type

string

Default Value

""

Remarks

If FirewallServer is specified, the FirewallUser and FirewallPassword properties are used to connect and authenticate to the given firewall.

 

Firewall Port

Data Type

string

Default Value

""

Remarks

Note that the driver sets the FirewallPort to the default port associated with the specified FirewallType. See the description of the FirewallType option for details.

 

Firewall Server

Data Type

string

Default Value

""

Remarks

If this property is set to a domain name, a DNS request is initiated and the name is translated to the corresponding IP address.

 

Firewall Type

Data Type

string

Default Value

"NONE"

Remarks

The applicable values are:

 

   
Firewall Type Default FirewallPort
TUNNEL 80
SOCKS4 1080
SOCKS5 1080

 

 

Firewall User

Data Type

string

Default Value

""

Remarks

If the FirewallServer is specified, the FirewallUser and FirewallPassword properties are used to connect and authenticate against the firewall.

 

Google Big Query Options

Data Type

string

Default Value

""

Remarks

A list of Google BigQuery options:

 

   
Option Description
gbqoImplicitJoinAsUnion This option will prevent the driver from converting an IMPLICIT JOIN into a CROSS JOIN as expected by SQL92. Instead, it will leave it as an IMPLICIT JOIN, which Google BigQuery will execute as a UNION ALL.

 

 

Initiate OAuth

Data Type

string

Default Value

"OFF"

Remarks

The following options are available:

  1. OFF: Indicates that the OAuth flow will be handled entirely by the user. An OAuthAccessToken will be required to authenticate.
  2. GETANDREFRESH: Indicates that the entire OAuth Flow will be handled by the driver. If no token currently exists, it will be obtained by prompting the user via the browser. If a token exists, it will be refreshed when applicable.
  3. REFRESH: Indicates that the driver will only handle refreshing the OAuthAccessToken. The user will never be prompted by the driver to authenticate via the browser. The user must handle obtaining the OAuthAccessToken and OAuthRefreshToken initially.

 

 

Location

Data Type

string

Default Value

""

Remarks

The path to a directory which contains the schema files for the driver (.rsd files for tables and views, .rsb files for stored procedures). The Location property is only needed if you would like to customize definitions (e.g., change a column name, ignore a column, etc.) or extend the data model with new tables, views, or stored procedures.

The schema files used in your application must be deployed with other assemblies. You must also ensure that Location points to the folder that contains the schema files. The folder location can be a relative path from the location of the executable.

 

Logfile

Data Type

string

Default Value

""

Remarks

For more control over what is written to the log file, take a look at Verbosity.

 

Max Log File Size

Data Type

string

Default Value

""

Remarks

A string specifying the maximum size in bytes for a log file (ex: 10MB). When the limit is hit, a new log is created in the same folder with the date and time appended to the end. There is no limit by default. Values lower than 100kB will use 100kB as the value instead.

 

OAuth Access Token

Data Type

string

Default Value

""

Remarks

The OAuthAccessToken property is used to connect using OAuth. The OAuthAccessToken is retrieved from the OAuth server as part of the authentication process. It has a server-dependent timeout and can be reused between requests.

The access token is used in place of your username and password. The access token protects your credentials by keeping them on the server.

 

OAuth Client Id

Data Type

string

Default Value

""

Remarks

OAuth requires you to register your application. As part of the registration, you will receive a client Id, sometimes also called a consumer key, and a client secret. You must specify both the OAuthClientId and OAuthClientSecret to connect to an OAuth server.

 

OAuth Client Secret

Data Type

string

Default Value

""

Remarks

OAuth requires you to register your application. As part of the registration you will receive a client Id and a client secret, sometimes also called a consumer secret. You must specify both the OAuthClientId and OAuthClientSecret to connect to an OAuth server.

 

OAuth JWT Cert

Data Type

string

Default Value

""

Remarks

The name of the certificate store for the client certificate.

The OAuthJWTCertType field specifies the type of the certificate store specified by OAuthJWTCert. If the store is password protected, specify the password in OAuthJWTCertPassword.

OAuthJWTCert is used in conjunction with the OAuthJWTCertSubject field in order to specify client certificates. If OAuthJWTCert has a value, and OAuthJWTCertSubject is set, a search for a certificate is initiated. Please refer to the OAuthJWTCertSubject field for details.

Designations of certificate stores are platform-dependent.

The following are designations of the most common User and Machine certificate stores in Windows:

 

 

   
MY A certificate store holding personal certificates with their associated private keys.
CA Certifying authority certificates.
ROOT Root certificates.
SPC Software publisher certificates.

 

In Java, the certificate store normally is a file containing certificates and optional private keys.

When the certificate store type is PFXFile, this property must be set to the name of the file. When the type is PFXBlob, the property must be set to the binary contents of a PFX file (i.e. PKCS12 certificate store).

 

OAuth JWT Cert Password

Data Type

string

Default Value

""

Remarks

If the certificate store is of a type that requires a password, this property is used to specify that password in order to open the certificate store.

 

OAuth JWT Cert Subject

Data Type

string

Default Value

"*"

Remarks

When loading a certificate the subject is used to locate the certificate in the store.

If an exact match is not found, the store is searched for subjects containing the value of the property.

If a match is still not found, the property is set to an empty string, and no certificate is selected.

The special value "*" picks the first certificate in the certificate store.

The certificate subject is a comma separated list of distinguished name fields and values. For instance "CN=www.server.com, OU=test, C=US, E=support@cdata.com". Common fields and their meanings are displayed below.

 

   
Field Meaning
CN Common Name. This is commonly a host name like www.server.com.
O Organization
OU Organizational Unit
L Locality
S State
C Country
E Email Address

 

If a field value contains a comma it must be quoted.

 

OAuth JWT Cert Type

Data Type

string

Default Value

""

Remarks

This property can take one of the following values:

 

   
USER - default For Windows, this specifies that the certificate store is a certificate store owned by the current user. Note: This store type is not available in Java.
MACHINE For Windows, this specifies that the certificate store is a machine store. Note: this store type is not available in Java.
PFXFILE The certificate store is the name of a PFX (PKCS12) file containing certificates.
PFXBLOB The certificate store is a string (base-64-encoded) representing a certificate store in PFX (PKCS12) format.
JKSFILE The certificate store is the name of a Java key store (JKS) file containing certificates. Note: this store type is only available in Java.
JKSBLOB The certificate store is a string (base-64-encoded) representing a certificate store in Java key store (JKS) format. Note: this store type is only available in Java.
PEMKEY_FILE The certificate store is the name of a PEM-encoded file that contains a private key and an optional certificate.
PEMKEY_BLOB The certificate store is a string (base64-encoded) that contains a private key and an optional certificate.
PUBLIC_KEY_FILE The certificate store is the name of a file that contains a PEM- or DER-encoded public key certificate.
PUBLIC_KEY_BLOB The certificate store is a string (base-64-encoded) that contains a PEM- or DER-encoded public key certificate.
SSHPUBLIC_KEY_FILE The certificate store is the name of a file that contains an SSH-style public key.
SSHPUBLIC_KEY_BLOB The certificate store is a string (base-64-encoded) that contains an SSH-style public key.
P7BFILE The certificate store is the name of a PKCS7 file containing certificates.
PPKFILE The certificate store is the name of a file that contains a PPK (PuTTY Private Key).
XMLFILE The certificate store is the name of a file that contains a certificate in XML format.
XMLBLOB The certificate store is a string that contains a certificate in XML format.

 

 

OAuth JWT Issuer

Data Type

string

Default Value

""

Remarks

The issuer of the Java Web Token. This is typically either the Client ID or Email Address of the OAuth Application.

 

OAuth JWT Subject

Data Type

string

Default Value

""

Remarks

The user subject for which the application is requesting delegated access. Typically, the user account name or email address.

 

OAuth Refresh Token

Data Type

string

Default Value

""

Remarks

The OAuthRefreshToken property is used to refresh the OAuthAccessToken when using OAuth authentication.

 

OAuth Settings Location

Data Type

string

Default Value

"%APPDATA%\\CData\\GoogleBigQuery Data Provider\\OAuthSettings.txt"

Remarks

When InitiateOAuth is set to GETANDREFRESH or REFRESH, the driver saves OAuth values to a settings file to avoid requiring the user to manually enter OAuth connection properties. The default OAuthSettingsLocation is a settings file located in the %AppData%\CData folder.

 

Offline

Data Type

bool

Default Value

false

Remarks

When Offline is set to TRUE, all queries execute against the cache as opposed to the live data source. In this mode, certain queries like INSERT, UPDATE, DELETE, and CACHE are not allowed.

 

Other

Data Type

string

Default Value

""

Remarks

The Other property is a semicolon-separated list of name-value pairs used in connection parameters specific to a data source.

Caching Configuration

 

   
CachePartial=True Caches only a subset of columns, which you can specify in your query.
QueryPassthrough=True Passes the specified query to the cache database instead of using the SQL parser of the driver.

 

Integration and Formatting

 

   
SupportAccessLinkedMode In Access' linked mode, it is generally a good idea to always use a cache as most data sources do not support multiple Id queries. However if you want to use the driver in Access but not in linked mode, this property must be set to False to avoid using a cache of a SELECT * query for the given table.
ConvertDateTimesToGMT Whether to convert date-time values to GMT, instead of the local time of the machine.
RecordToFile=filename Records the underlying socket data transfer to the specified file.
ClientCulture This property can be used to specify the format of data (e.g., currency values) that is accepted by the client application. This property can be used when the client application does not support the machine's culture settings. For example, Microsoft Access requires 'en-US'.
Culture This setting can be used to specify culture settings that determine how the driver interprets certain data types that are passed into the driver. For example, setting Culture='de-DE' will output German formats even on an American machine.

 

 

Page Size

Data Type

string

Default Value

"100000"

Remarks

The pagesize can control the number of results returned per page from Google BigQuery. Setting a higher pagesize will cause more data to come back in a single HTTP request, but may take longer to execute. Setting a smaller pagesize will increase the number of HTTP requests to get all the data, but is generally recommended to ensure timeout exceptions do not occur.

 

Polling Interval

Data Type

string

Default Value

"2"

Remarks

Only applicable when DestinationTable is set, or AllowLargeResultSets is true. This property determines how long to wait between checking whether or not the query's results are ready. Very large resultsets or complex queries may take longer to process, and a low polling interval may result in may unnecessary requests being made to check the query status.

 

Project Id

Data Type

string

Default Value

""

Remarks

The ProjectId of the project you wish to connect to and view tables of.

 

Proxy Auth Scheme

Data Type

string

Default Value

"BASIC"

Remarks

This value may be BASIC, DIGEST, NONE, NTLM, NEGOTIATE or PROPRIETARY.

 

Proxy Auto Detect

Data Type

bool

Default Value

true

Remarks

This indicates whether to use the default system proxy settings or not. Set ProxyAutoDetect to FALSE to use custom proxy settings. This takes precedence over other proxy settings.

 

Proxy Password

Data Type

string

Default Value

""

Remarks

If the ProxyServer is specified, the ProxyUser and ProxyPassword properties are used to connect and authenticate against the firewall.

 

Proxy Port

Data Type

string

Default Value

"80"

Remarks

See the description of the ProxyServer field for details.

 

Proxy Server

Data Type

string

Default Value

""

Remarks

If this property is set to a domain name, a DNS request is initiated and the name is translated to the corresponding address.

 

Proxy SSL Type

Data Type

string

Default Value

"AUTO"

Remarks

This value may be AUTO, ALWAYS, NEVER, or TUNNEL.

 

Proxy User

Data Type

string

Default Value

""

Remarks

If a ProxyServer is specified, the ProxyUser and ProxyPassword options are used to connect and authenticate against the firewall.

 

Pseudo Columns

Data Type

string

Default Value

""

Remarks

This setting is particularly helpful in Entity Framework, which does not allow you to set a value for a pseudo column unless it is a table column. The value of this connection setting is of the format "Table1=Column1, Table1=Column2, Table2=Column3". You can use the "*" character to include all tables and all columns; i.e., "*=*".

 

Query Cache

Data Type

string

Default Value

""

Remarks

The QueryCache allows you to cache the results of a query in-memory and use them until the cache expires. Setting the QueryCache can improve performance if the same or similar (see below) queries are executed often. The in-memory query cache is shared across connections, so it can help with performance even if more than one connection is being used.

The cache manager for QueryCache will not only use the results in the cache for exactly the same query, but also for queries that represent a subset of data in the cached query. For example, in the following queries, the cache created while executing Query A will be used to obtain the results for both Query B and Query C.

SELECT * from Account; // Query A
SELECT * from Account WHERE Name LIKE '%John'; // Query B
SELECT Id, Name from from Account LIMIT 10; // Query C

 

Setting the QueryCache to zero disables in-memory caching.

 

Query Passthrough

Data Type

bool

Default Value

false

Remarks

This option passes the query to Google BigQuery as-is.

 

Readonly

Data Type

bool

Default Value

false

Remarks

If this property is set to true, the driver will allow only SELECT queries. INSERT, UPDATE, DELETE, and stored procedure queries will cause an error to be thrown.

 

RTK

Data Type

string

Default Value

""

Remarks

The RTK property may be used to license a build. Please see the included licensing file to see how to set this property. The runtime key is only available if you purchased an OEM license.

 

SSL Server Cert

Data Type

string

Default Value

""

Remarks

If using a TLS/SSL connection, this property can be used to specify the TLS/SSL certificate to be accepted from the server. Any other certificate that is not trusted by the machine will be rejected. This can take the form of a full PEM certificate, the path to a file containing the certificate, the public key, the MD5 thumbprint, or the SHA1 thumbprint. If not specified, any trusted certificate will be accepted. Use '*' to signify to accept all certificates (not recommended for security concerns).

 

Tables

Data Type

string

Default Value

""

Remarks

Listing the tables from some databases can be expensive. Providing a list of tables in the connection string improves the performance of the driver.

 

Temp Table Dataset

Data Type

string

Default Value

"_CDataTempTableDataset"

Remarks

The name of the dataset that will contain temporary tables when executing queries with large result sets.

 

Temp Table Expiration Time

Data Type

string

Default Value

"3600"

Remarks

Time, in seconds until the temporary table expires. Set to 0 to have the table never expire. The minimum value is 3600 (One Hour).

 

Timeout

Data Type

string

Default Value

"60"

Remarks

If the Timeout property is set to 0, operations do not time out: They run until they complete successfully or encounter an error condition.

If Timeout expires and the operation is not yet complete, the driver throws an exception

 

Verbosity

Data Type

string

Default Value

"1"

Remarks

The verbosity level determines the amount of detail that the driver reports to the Logfile. Verbosity levels from 1 to 5 are supported. These are described below:

 

   
1 Setting Verbosity to 1 will log the query, the number of rows returned by it, the start of execution and the time taken, and any errors.
2 Setting Verbosity to 2 will log everything included in Verbosity 1, cache queries, and HTTP headers.
3 Setting Verbosity to 3 will additionally log the body of the HTTP requests.
4 Setting Verbosity to 4 will additionally log transport-level communication with the data source. This includes SSL negotiation.
5 Setting Verbosity to 5 will additionally log communication with the data source and additional details that may be helpful in troubleshooting problems. This includes interface commands.

The Verbosity should not be set to greater than 1 for normal operation. Substantial amounts of data can be logged at higher verbosities, which can delay execution times.




 

Views

  1. Datasets
  2. Projects

Datasets

Lists all the accessible datasets for a given project.

 

Columns

 

 

 

   
Name Type Description
Id [KEY] String The fully qualified, unique, opaque Id of the dataset.
Kind String The resource type.
FriendlyName String A descriptive name for the dataset
DatasetReference_ProjectId String A unique reference to the container project.
DatasetReference_DatasetId String A unique reference to the dataset, without the project name.

 

 

 

Projects

Lists all the projects for the authorized user.

 

Columns

 

 

 

   
Name Type Description
Id [KEY] String The unique identifier of the Project
Kind String The resource type.
FriendlyName String A descriptive name for the project.
NumericId String The numeric Id of the project.
ProjectReference_ProjectId String A unique reference to the project.