Skip to content

Data Query Service

Idea

The Data Query Service can be used to create, update, retrieve, delete and execute queries.

Access

For accessing this service, you need to have the respective roles listed in Data Contextualization roles and scopes.

The Data Query Service APIs can be accessed in either of the following ways:

  1. Using REST client: Users of Data Contextualization admin role can create, update, delete and retrieve queries from the Data Query APIs. In future, users would also get access to in-built query editors.
  2. Using Apps: Analytical applications built on Data Contextualization can use queried data using query results. The query results API can be integrated with analytical applications.

Application users can access the REST APIs using REST Client. Depending on the APIs, users will need different roles to access the Data Query Service. The Data Query Service APIs can be accessed by Data Contextualization admin role or technical user roles. The only exception is Query Results API can be accessed by Data Contextualization admin, technical user and Data Contextualization user roles.

Note

Access to the Data Query Service APIs are protected by authentication methods, using OAUTH credentials.

Basics

You can create and execute business queries as well as physical queries using the Data Query Service. Business queries are validated based on selected semantic model and physical queries are validated based on physical schemas of data within a tenant. Users need to create queries by using native SQL query format. For more information about how to formulate queries, refer How to create queries.

Data Contextualization provides state-of-art native SQL query engine to query data across disparate sources. Users can now leverage in-built query functionality to gain insights into semantically correlated data. Queries can be either Physical-schema based or semantic based queries. They can be static queries created by data analysts & consumed by end-users or dynamic queries can be created which help users to create multiple execution requests with varied range of input parameters on a single query.

This helps users to dynamically provide range of filter values on which correlated queried data results can be consumed. Further, using standard Native SQL query functions different aggregate functions can be used to create different KPIs analytics and will support variety of use-cases.

Features

Data Contextualization supports custom Native SQL queries. Native SQL queries format support both static & dynamic type of queries. Dynamic queries are parameterized queries where dynamic filter inputs can be provided and aliases can be provided to query result attributes. Operators like Where, inner join, select, order by, from, etc. are supported. Query service supports both semantic queries and physical schema-based queries. If you have created semantic queries based on incomplete semantic model, Data Contextualization allows the users to save queries but to get query results; user need to update all the details in the semantic models. Physical schema-based queries are validated against physical schemas. Only validated queries can be saved and are validated against the provided schemas.

Native SQL based queries supports both static and dynamic (Parameterized queries and alias for query result column names) queries.

In a SQL query, the column values can be defined either in where clause or having clause. If the user provides values for column attributes during query creation, then that query is considered as static query. Users cannot create execution Job requests for the static query, since the system creates it after the query creation. If the user updates the query after validation, the system re-executes the query.

In case of query is categorized as dynamic query (isDynamic=TRUE), if a user wants to provide different values for parameters at the time of execution or needs to get different alias (display name) for columns in the select clause, then the user can set the isDynamic flag to true and provide values for parameters and aliases at the time of execution of job creation. The user can create multiple execution jobs.

If dynamic query gets updated, then all the execution jobs will be marked as obsolete. The user will get outdated results against obsolete execution jobs. The user then needs to create a new execution job to get new results.

If data ingest happens on the affected schemas of corresponding queries, the system will validate all the affected queries due to schema change.
If validation fails, then

  • In case of static query, the user will not receive the result of execution. In case of valid query, the user can see outdated results for few seconds and then new results.
  • In case of dynamic query, all the execution jobs will be marked as obsolete and the user gets the outdated results against those ids.

All the execution jobs which are marked as obsolete will be deleted after 3 months of Obsolete status. The cached execution results will be also deleted.

If query validation is successful, then

  • All the queries will be re-executed and the user will get the current results post execution. For all the queries wherein the execution is not complete, the user will get outdated results.

Data Contextualization validates every query created by the user. If the query is not executable, then Data Contextualization provides "executable" flag to False. This helps the users to understand that the query cannot be executed. To modify the query, the user can analyze the "pendingActions" provided for users to modify the query. Once the "executable" flag is TRUE, then the user can get the results with status of execution. Currently, Data Contextualization provides "current", "outdated" and "obsolete" for query results provided.

Data Contextualization currently supports UTC format dates (for example, 2020-02-15T04:46:13Z) and W3C format dates (for example, 2020-10-15T04:46:13+00:00) in the queries.

Note

Native SQL should be encoded using base64 encoder before adding to request.

Note

All the execution jobs which are marked as obsolete will be deleted after 3 months of Obsolete status. The cached execution results will be also deleted.

Limitations

  • The timeout period for Query Results and ExecutionJob Results API is 60 seconds.
  • The maximum concurrent request for Data Contextualization is restricted to 100 for each tenant.
  • There will be soft-limit and hard-limit to number of executionJobs for every tenant. For more details, contact support.
  • There will be soft-limit and hard-limit to number of select columns in the query. For more details, contact support.
  • The maximum payload for query results response can be 250 MB. When query results exceed this limit, Data Contextualization returns partial results with a warning message included in the query response header.
  • In where clause, if column value is a "string", then the user needs to pass the value within single quotes '{value}'.
  • The user can only create select queries.

Native SQL based query Limitations

  • In the native SQL query, if schema name or property name contains any special character, then enclose it with backtick(`).
  • The current version does not support following SQL operations: top, select *, ALL, ANY, COUNT( * )
  • FULL OUTER JOIN is not supported, instead user can use FULL JOIN as FULL OUTER JOIN and FULL JOIN are the same.
  • In queries having BETWEEN function, user should not use 'BETWEEN #{date}# AND #{date}#' format, instead use BETWEEN '{date}' AND '{date}' format.
  • In queries for string concatenation use CONCAT function.

For more information, refer Release notes.

Example Scenario

Application developers wants to integrate query results API into dashboard application to receive queried data. Data is ingested into Data Contextualization from Enterprise Resource Planning (ERP) and IoT data after which schemas are generated automatically by Data Contextualization. Data analysts can formulate queries on basis of semantic model or different schemas from data catalog. Each query is uniquely identified by queryid. Application developers can use queryid and integrate query results API into dashboard application to receive queried data.

Further, dynamic query functionality help users to create multiple execution requests on a single query. For eg: If user wants to create a dashboard that exhibits information about plant performance from different locations. Then app developers can create queries to provide same data attributes for different plants and value of location can be sent dynamically over a different execution requests.

This helps users to dynamically provide range of filter values on which correlated queried data results can be consumed. Finally, standard Native SQL query engine provide ability to create different aggregate functions in the query that can be used to create different KPIs analytics and will support variety of use-cases to gain insights from data from disparate sources.