Data Management Service Sample Application¶
Creating a Data Registry for two cross-domain data and uploading file¶
The customer wants to analyze Design and Plant data. The customer first creates a data registry for these two sources. Let us assume that Design data contains XML type files and Plant data contains CSV files. so two data tags are created for each source. The customer also wants to append the data from file generated by design team so that new versions are created when the input XML file is different in nature, or data is appended when data points are changed but schema remains unchanged The customer wants to replace the schema and data for Plant provided input files.
Prerequisites¶
- You have the right assigned role or technical user credentials.
- The file to be uploaded is of supported file format.
- Replace sdi_version with v3 or v4 depending on the Data Contextualization API version in all sample endpoints below.
Create a Data Registry for two cross-domain data¶
Two data registries are created for two sources. This can be done using the endpoint:
POST /api/sdi/sdi_version/dataRegistries
For the Design data source, the body of the request is:
{
[
"datatag": "classification",
"defaultRootTag":"ClassificationCode",
"filePattern": "[a-zA-Z]+.xml"
"fileUploadStrategy": "append",
"sourceName": "Design",
"metaDataTags": ["teamcenter"]
]
}
The result can be verified by checking the response:
{
"registryId": "24537F02B61706A223F9D764BD0255C8",
"sourceName": "design",
"dataTag": "classification",
"xmlProcessRules": [],
"metaDataTags": ["teamcenter"],
"defaultRootTag": "ClassificationCode",
"filePattern": "[a-z_A-Z0-9]+.xml",
"createdDate": "2019-10-21T16:16:08.783Z",
"updatedDate": "2019-10-21T16:16:08.783Z",
"mutable": false,
"fileUploadStrategy": "append",
"category": "ENTERPRISE"
}
For the Plant data source, the body of the request is:
{
[
"datatag": "plantprocess",
"filePattern": "[a-zA-Z]+.csv",
"fileUploadStrategy": "replace",
"sourceName": "Plant",
"metaDataTags": ["USAPlant"]
]
}
The result can be verified by checking the response:
{
"registryId": "3ADBD1C08D3625C5C0B2AEE9D06CC294",
"sourceName": "plant",
"dataTag": "plantprocess",
"xmlProcessRules": [],
"metaDataTags": ["USAPlant"],
"defaultRootTag": null,
"filePattern": "[a-z_A-Z0-9]+.csv",
"createdDate": "2019-10-21T16:16:56.466Z",
"updatedDate": "2019-10-21T16:16:56.466Z",
"mutable": false,
"fileUploadStrategy": "replace",
"category": "ENTERPRISE"
}
Once the data registry is created then customer can perform the upload based on generated data registry.
Creating advanced Data Registry for XML\PLMXML¶
The complex\nested XML can be transformed using xmlProcessRules. User can ignore or flatten nested elements using ignore or index rule. In case the xml contains nested element, then add the rule to xmlProcessRules during registry creation.
The body of the request:
<Occurrence id="id33">
<UserData id="id32" type="AttributesInContext">
<UserValue value="" title="OccurrenceName"> </UserValue>
<UserValue value="1400" title="SequenceNumber"></UserValue>
<UserValue value="" title="ReferenceDesignator"></UserValue>
</UserData>
</Occurrence>
<ProductRevision id="id79" name="90214255__001__PART_WF" accessRefs="#id4" subType="ItemRevision" masterRef="#id80" revision="aa">
<AssociatedDataSet id="id81" dataSetRef="#id78" role="PhysicalRealization"></AssociatedDataSet>
<AssociatedDataSet id="id181" dataSetRef="#id180" role="PhysicalRealization"></AssociatedDataSet>
<AssociatedDataSet id="id205" dataSetRef="#id204" role="PhysicalRealization"></AssociatedDataSet>
</ProductRevision>
For the index rule, the value of title is actually a transform/ flattened column and not the title itself. A valid registry for that transform rule is:
{
"dataTag": "occ",
"filePattern": "[a-zA-Z0-9]+.xml",
"fileUploadStrategy": "append",
"defaultRootTag":"Occurrence",
"xmlProcessRules": [
"Occurrence.UserData.UserValue.index=title"
],
"sourceName": "teamcenter"
}
In this case, index tag defines the transform rule so that, instead of treating Occurrence.UserData.UserValue_value and Occurrence.UserData.UserValue_title as a column, the system would treat Occurrence.Userdata.UserValue.OccurenceName.value as transformed column.
For the ignore rule, the body of the request is:
{
"dataTag": "productrev",
"filePattern": "[a-zA-Z0-9]+.xml",
"fileUploadStrategy": "append",
"defaultRootTag":"ProductRevision",
"xmlProcessRules": [
"ignore=AssociatedDataSet"
],
"sourceName": "teamcenter"
}
In this case Ignore tag is what defines an element that needs to be ignored. In this case all elements and sub-elements of AssociatedDataSet will be ignored from processing.
Creating custom data types during schema generation¶
The customer wants to find out sample regular expression based on the sample data and create their own custom data types that should be used by the Data Contextualization system during schema generation. Some of the data contain email addresses of employees.
This can be done using the endpoint:
POST /sdi/api/sdi_version/suggestPatterns
With the example values, the URL pattern is as below:
/api/sdi/sdi_version/suggestPatterns?sampleValues=myrealemployee@realemail.com&testValues=anothertrueamployee@realemail.com, notmyemail@notmyemail.com
Two patterns will be generated. The customer can register the pattern using register data type endpoints.
[
{
"schema": "[a-z]+[@][a-z]+email.com",
"matches": [false,true],
"schemaValid": true
},
{
"schema": "[a-z]+[@][a-z]+[\\.][a-z]+",
"matches": [false, true],
"schemaValid": true
}
]
Searching schema¶
Data is fed into Data Contextualization from ERP corresponding to inventory parts data. The ingested file is CSV file. Search Schema POST method will provide schema of this ingested file with attribute name, data types. Using the POST Method, /searchSchemas schemas can be retrieved for job complete status files. Request Payload can be:
{
"schemas": [ -> The elements in this list must be similar, each element must contain homogenous parameters. (a combination of dataTag, sourceName and schemaName)
{
"dataTag": "string",
"schemaName": "string",
"sourceName": "string"
}
]
}
{
"schemas": [ -> The elements in this list must be similar, each element must contain homogenous parameters. (a combination of dataTag, sourceName, metadataTags and schemaName)
{
"dataTag": "string",
"schemaName": "string",
"sourceName": "string",
"metaDataTags": ["string"]
}
]
}
{
"schemas": [ -> The elements in this list must be similar, each element must contain homogenous parameters. (like metadataTags)
{
"metaDataTags": ["string"]
}
]
}
Example of creating a data registry for two cross-domain data IDL user¶
The customer is interested in analyzing Design and Plant data. In this case customer first creates a data registry for those two sources. Let’s say design data contains XML type files and Plant data contains CSV files, so customer creates two data tags for each source. The customer also wants to append the data from file generated by design team so that new versions are created when the input XML file is different in nature or data is appended when there is no change in schema but only data points are changed. The customer wants to replace the schema and data for Plant provided input files.
POST /api/sdi/sdi_version/dataRegistries
{
[
"datatag": "classification",
"defaultRootTag":"ClassificationCode",
"filePattern": "[a-zA-Z]+.xml",
"fileUploadStrategy": "append",
"sourceName": "Design",
"metaDataTags": ["teamcenter"]
]
}
POST /api/sdi/sdi_version/dataRegistries
{
[
"datatag": "plantprocess",
"filePattern": "[a-zA-Z]+.csv",
"fileUploadStrategy": "replace",
"sourceName": "Plant",
"metaDataTags": ["USAPlant"]
]
}
Once the data registry is created then the customer can perform the upload based on generated data registry using IDL and provide the registryId created above.
Example of Schema Evolution¶
This section explains how schema evolution works for the given input files and data registry.
Data Registry: NHTSA (source), Vehicle (data tag)
Ingested File Sequence and test data¶
- File Name: vehicle_202001.csv contains sample data:
ID | Name | MfgDate |
---|---|---|
12345 | AwesomeCar | 12:20:2015 |
34555 | AnotherAwesomeCar | 13:01:2016 |
32131 | AnotherAwesomeCar | 01:12:2019 |
GeneratedSchema:
id: integer
name:string
mfgdate:timestamp
- File Name: vehicle_202002.csv contains sample data:
ID | Name | MfgDate |
---|---|---|
34-456 | OKCar | 12:20:2020 |
34555 | AnotherAwesomeCar | 13:01:2016 |
32131 | AnotherAwesomeCar | 01:12:2019 |
GeneratedSchema:
id: string.
name:string
mfgdate:timestamp
The type for property id is changed from integer to string, as the record limit is within 500. So, Data Contextualization will allow the evolution of schema and changing the data type.
- File Name: vehicle_202003.csv contains sample data:
ID | Name | Price | MfgDate |
---|---|---|---|
34-456 | OKCar | 25000 | 12:20:2020 |
34555 | AnotherAwesomeCar | 50000 | 13:01:2016 |
32131 | AnotherAwesomeCar | 55000 | 01:12:2019 |
GeneratedSchema:
id: string
name:string
> mfgdate:timestamp
price:integer
In this case, schema is evolved and new column price is added.
- File Name: vehicle_202004.csv contains sample data:
ID | Name | Price | MfgDate |
---|---|---|---|
34-567 | GreatCar | 65810.45 | Unknown |
34555 | AnotherAwesomeCar | 50000 | 13:01:2016 |
32131 | AnotherAwesomeCar | 55000 | 01:12:2019 |
GeneratedSchema:
id: string
name:string
mfgdate:string
price:float
The type for the property mfgdate is changed from timestamp to string. The data type for the property price is changed from integer to float, as record limit is within 500. So, Data Contextualization will allow the evolution of schema and changing the data type.
- File Name: vehicle_202005.csv contains sample data: after 500 records
ID | Name | Price | MfgDate |
---|---|---|---|
34-789 | AwesomeCar | Unavailable | Unknown |
34555 | AnotherAwesomeCar | 50000 | 13:01:2016 |
32131 | AnotherAwesomeCar | 55000 | 01:12:2019 |
This results in error as Data Contextualization is unable to convert column price to String because of incompatible type-source float, incoming is string and record limit has reached to 500. So type change is not allowed during schema evolution.
- File Name: vehicle_202006.csv contains sample data: after 500 records
ID | Name | Price | MfgDate |
---|---|---|---|
2356 | AwesomeCar | 95000 | 13:01:2024 |
34555 | AnotherAwesomeCar | 50000 | 13:01:2016 |
32131 | AnotherAwesomeCar | 55000 | 01:12:2019 |
GeneratedSchema:
id: string- Incoming type is changed to encompassing type string, based on the existing type.
name:string
mfgdate:string Incoming type is changed to encompassing type string, based on the existing type.
price:float
The type for property id and mfgdate is string since the existing type is string. So, the incoming data type is changed to string to make the schema consistent.
IngestJobStatus: Get a list of jobIds that is ingested, this jobId can be used to find detailed status using ingestJobStatus. The jobsIds can be filtered based on the specified filter criteria. When multiple filter criteria are provided, system will show jobIds which conform to all the specified criteria.
Filter Parameters:
startedDate : Filter based on startedDate. Get the list of ingestJobs which have startedDate greater than given criteria. e.g. 1970-01-01T00:00:00.000Z
finishedDate : Filter based on finishedDate. Get the list of ingestJobs which have finishedDate less than given criteria. e.g. 1970-01-01T00:00:00.000Z
status : Filter based on status.