Skip to content

Uploading data to Integrated Data Lake

This section describes how to upload the data to Integrated Data Lake.

Prerequisites

The selection of methods solely depends on the kind of requirement. You can perform, upload the data to Integrated Data Lake using below defined methods:

  1. Generate signed URL for AWS or Shared Access Signatures (SAS) for Azure
  2. Cross account access for AWS or Service Principal for Azure
  3. Simple Token Service for AWS

Generate Signed URL or Shared Access Signatures

To use this method, you can follow below steps:

  1. To generate signed URL or Shared Access Signatures to upload an object

    Endpoint:

    POST /api/datalake/v3/generateUploadObjectUrls
    

    Content-Type: application/json

    Request example:

      {
        "paths": [
          {
            "path": "myfolder/mysubfolder/myobject.objext"
          }
        ]
      }
    
      {
        "paths": [
          {
            "path": "myfolder/mysubfolder/myobject.objext"
          }
        ]
      }
    
      {
        "paths": [
          {
            "path": "myfolder/mysubfolder/myobject.objext"
          }
        ]
      }
    

    Response example:

      {
          "objectUrls":[
              {
                  "signedUrl":"https://datalake-integ-dide2-5234525690573.s3.eu-central-1.amazonaws.com/data/ten%3Ddide2/myfolder/mysubfolder/myobject.objext?X-Amz-Security-Token=Awervzdg23452xvbxd3434ddg&X-Amz-SignedHeaders=host&X-Amz-Expires=7200&X-Amz-Credentials=ASIATCES50453sdf&X-Amz-Signature=2e2342sfgsdfgsdgh",
                  "path":"myfolder/mysubfolder/myobject.objext"
              }
          ]
      }
    
      {   "objectUrls":[    {
              "Shared Access Signatures":"https://idltntprovisioningrc.blob.core.windows.net/datalake-rc-punrc118/data/ten=punrc118/folder1/mysensordata.log?sv=2018-11-09&spr=https&se=2020-03- 06T12%3A57%3A24Z&sr=b&sp=aw&sig=Hz154Gtw6xkZzD5MB7BaiFNx9mcU0ZqswEVdJfcWTUg%3D&2018-01-01T00%3A00%3A00.0000000Z",
              "path":"folder1/mysensordata.log"
            }
        ]
      }
    
      {
          "objectUrls": [
              {
                  "signedUrl": "https://datalake-integ-cdiot0-1627437476734.oss-cn-shanghai.aliyuncs.com/data/ten%3Dcdiot0/myfolder/mysubfolder/myobject.objext?Expires=1661144053&OSSAccessKeyId=LTAI5t9E24F6VDb6Q4hsyFNk&Signature=sz8e4JG42tQIX3Jg%2BNU1ZPRtlUQ%3D",
                  "path": "myfolder/mysubfolder/myobject.objext"
              }
            ]
      }
    
  2. You can use this signed URL or Shared Access Signatures to upload one or multiple objects to the target folder. This URL is valid for 120 mins for AWS and 720 mins for Azure. Once the time limit is expired, you need to regenerate the signed URL or Shared Access Signatures again.

Request Sample:

PUT https://datalake-integ-dide2-5234525690573.s3.eu-central-1.amazonaws.com/data/ten%3Ddide2/myfolder/mysubfolder/myobject.objext?X-Amz-Security-Token=Awervzdg23452xvbxd3434ddg&X-Amz-SignedHeaders=host&X-Amz-Expires=7200&X-Amz-Credentials=ASIATCES50453sdf&X-Amz-Signature=2e2342sfgsdfgsdgh
PUT https://idltntprovisioningrc.blob.core.windows.net/datalake-rc-punrc118/data/ten=punrc118/folder1/mysensordata.log?sv=2018-11-09&spr=https&se=2020-03-06T12%3A57%3A24Z&sr=b&sp=aw&sig=Hz154Gtw6xkZzD5MB7BaiFNx9mcU0ZqswEVdJfcWTUg%3D&2018-01-01T00%3A00%3A00.0000000Z

Cross account access for AWS only

This method is used, if you need a continuous access to the desired folder for upload. Consider an example where you have an AWS account, where any application resides and this application needs to continuously access IDL folder. In such scenarios, Cross Account Access is useful.

To use this method, you can follow below steps:

  1. To create cross account on which access needs to be provided.

    POST /crossAccounts
    Content-Type: application/json
    

    Request example:

      {
        "name": "testCrossAccount",
        "accessorAccountId": "960568630345",
        "description": "Cross Account Access for Testing",
        "subtenantId": "204a896c-a23a-11e9-a2a3-2a2ae2dbcce4"
      }
    

    Response example:

      {
        "id": "20234sd34a23a-11e9-a2a3-2a2sdfw34ce4",
        "name": "testCrossAccount",
        "accessorAccountId": "960768132345",
        "description": "Cross Account Access for Testing",
        "timestamp": "2019-09-06T21:23:32.000Z",
        "subtenantId": "204a896c-a23a-11e9-a2a3-2a2ae2dbcce4",
        "eTag": 1
      }
    
  2. Once the cross account is created, perform cross account accesses to provide the desired access on desired prefix.

        x-ms-blob-type: BlockBlob
    
    POST /crossAccounts/20234sd34a23a-11e9-a2a3-2a2sdfw34ce4/accesses
    Content-Type: application/json
    

    Request example:

      {
        "description": "Access to write to mysubfolder",
        "path": "myfolder/mysubfolder",
        "permission": "WRITE"
      }
    

    Response example:

      {
        "id": "781c8b90-c7b6-4b1c-993c-b51a00b35be2",
        "description": "Access to write to mysubfolder",
        "storageAccount": "dlbucketname",
        "storagePath": "data/ten=tenantname/myfolder/mysubfolder",
        "path": "myfolder/mysubfolder",
        "permission": "WRITE",
        "status": "ENABLED",
        "timestamp": "2019-11-04T19:19:25.866Z",
        "eTag": 1
      }
    
  3. Once the access is provided, you can upload data through CLI or using AWS SDK to the desired prefix, with the relevant accesses.

Follow the commands given below to upload the files to S3 bucket:

$ aws s3 cp myobject.objext s3://tgsbucket

upload: ./myobject.objext to s3://tgsbucket/myobject.objext