AWS¶
Dependencies¶
Setup¶
First, you’ll need some AWS credentials. Without these you can only access public S3 buckets. Once you have those, S3 interaction will work. For other services such as Redshift, the setup is a bit more involved.
Once you have some AWS credentials, you’ll need to put those in a config file. Boto has a nice doc page on how to set this up.
Now that you have a boto config, we’re ready to interact with AWS.
Interface¶
odo
provides access to the following AWS services:
- S3 via boto.
- Redshift via a SQLAlchemy dialect
URIs¶
To access an S3 key, simply provide the path to the S3 key prefixed with
s3://
>>> csvfile = resource('s3://bucket/key.csv')
S3 commonly uses a prefix
to limit an operation to a subset of keys.
We can simulate a glob of keys by combining a prefix
with the *
character:
>>> csv_glob = resource('s3://bucket/prefix*.csv')
This will match all keys with starting with prefix
and ending with the .csv
extension. The result csv_glob
can be used just like a glob of files from your
local disk.
Accessing a Redshift database is the same as accessing it through SQLAlchemy
>>> db = resource('redshift://user:pass@host:port/database')
To access an individual table simply append ::
plus the table name
>>> table = resource('redshift://user:pass@host:port/database::table')
Conversions¶
odo
can take advantage of Redshift’s fast S3 COPY
command. It works
transparently. For example, to upload a local CSV file called users.csv
to a
Redshift table
>>> table = odo('users.csv', 'redshift://user:pass@host:port/db::users')
Remember that these are just additional nodes in the odo
network, and as
such, they are able to take advantage of conversions to types that don’t have
an explicit path defined for them. This allows us to do things like convert an
S3 CSV to a pandas DataFrame
>>> import pandas as pd >>> from odo import odo >>> df = odo('s3://mybucket/myfile.csv', pd.DataFrame)
TODO¶
- Multipart uploads for huge files
- GZIP’d files
- JSON to Redshift (JSONLines would be easy)
- boto
get_bucket
hangs on Windows