Amazon S3 support: Read and write S3 bucket data

Modified on Tue, 21 Dec, 2021 at 1:16 PM

Introduction

Omniscope supports reading and writing data to and from an Amazon S3 bucket. An Amazon S3 bucket is a public cloud storage resource available in Amazon Web Services. You can use S3 buckets to store and protect data of any size.

This functionality was introduced in a daily build in December 2021, and will be available in the first Rock build in 2022.

This article provides an overview of the Amazon S3 functionality in Omniscope. You should be familiar with Amazon S3 and have access to an S3 bucket to test reading and writing data.

Credentials

In order for Omniscope to make requests to Amazon Web Services you must use credentials issued by AWS. Typically your credentials will consist of an access key ID and a secret access key. For more information on how to create an AWS access key click here.

When you select the Amazon S3 file location when reading or writing data you can either read your credentials automatically or enter explicit credentials:

If Identify credentials automatically is ticked in the block options Omniscope will attempt to read your credentials using the following checks (in order):

Environmental variables. Omniscope will attempt to load your credentials from the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables.
Shared credentials and config files. Omniscope will attempt to load your credentials from the credentials or config files. Using a credentials file is the most common way that you can use to specify authentication and configuration to an external tool like Omniscope. You can also store your credentials in the config file, although Amazon does not recommend this as best practice and instead recommends using the credentials file. For Windows your credentials file should be located at: C:\Users\USERNAME\.aws\credentials. For Linx, macOS or Unix it should be located at ~/.aws/credentials. For more information click here.
Amazon EC2 instance profile credentials. Omniscope will attempt to load your credentials from the Amazon EC2 metadata service (if Omniscope is installed on an EC2 instance). By default this option is turned OFF to prevent users accessing unauthorised S3 buckets when Omniscope is installed on an EC2 instance. You can turn it on by ticking Admin > Settings > Advanced > Allow Amazon AWS credentials to be loaded from the Amazon EC2 metadata service.

Alternatively, you can supply your credentials explicitly by unticking Identify credentials automatically. You will then be prompted to enter your Access key ID and Access key secret:

Region

You need to select the region where the S3 bucket that you want to read/write to resides. If Identify region automatically is ticked Omniscope will attempt to lookup your region using the following checks (in order):

Environment variable. Omniscope will attempt to lookup your region from the AWS_REGION environment variable, if set.
The config file. Omniscope will attempt to load the region from the AWS configuration file if it exists and if a region is defined. On Windows this is usually located at C:\Users\USERNAME\.aws\config. On Linux this is usually located at ~/.aws/config. For more information click here.
Amazon EC2 instance. Omniscope will attempt to use the Amazon EC2 instance metadata service (if Omniscope is installed on an EC2 instance) to determine the region of the currently running Amazon EC2 instance.

Alternatively you can manually select a region by unticking Identify region automatically. You will then be able to select a region from a drop-down:

Reading data from an Amazon S3 bucket

In your workflow, add a new File block. Click on the block to open the options. In the Location section change the type to Amazon S3 File:

You now need to identify your credentials and region. Please refer to the earlier sections which describe this process.

Click the Confirm button. If your credentials were valid you should now see an option to select a Bucket and Path:

Select a bucket from the dropdown. Alternatively if it's a public bucket you can type the name of the bucket. Now click the Browse button to browse the S3 file system:

Browse to the file you want to read and click Choose. You can now configure the reading options in the same way you would for a local file. Click the execute button to load the file and click the Data tab to view the data:

Writing data to an Amazon S3 bucket

In your workflow, add a new File output block. Connect the data you want to write to the S3 bucket to the File output block:

Click on the File output block to open the options. Change the Location > Type drop-down to Amazon S3 File:

You now need to identify your credentials and region. Please refer to the earlier sections which describe this process.

Click the Confirm button. If your credentials were valid you should now see an option to select a Bucket and Path:

Select or enter the name of the bucket you want to write to. Click the Browse button to browse and select the file you want to write the data to:

Click Choose. Now configure the format of the data (eg. Omniscope data table, Excel, CSV) and the write options. Click the execute button to write the data to the S3 bucket.

Pricing

Amazon charges both for storing and reading/writing data to an S3 bucket. For information on pricing click here.

We have tried to engineer the S3 functionality in Omniscope to be as efficient as possible to avoid unnecessary charges. In many cases data can be read using a single request, however in some cases, such as when using a Join block, Omniscope may need to retrieve the data using multiple requests. If you are going to be regularly reading data from S3 you could consider downloading the data locally. In any case you should monitor your S3 charges carefully.