The Datasets API focuses on delivering large amount of data for statistical analysis, machine learning modeling and data warehousing. It primarily aims for daily data loads, but there is no technical limitation regarding this. Depending on the source, some data will be updated on varying schedules.
The Datasets API uses HTTPS requests like most RESTful APIs, however, the underlying infrastructure is a little bit more flexible, and some particular optimizations have been done to make things faster and to minimize data size. In the future, alternative interfaces will be developed, e.g., FTP transfer or e-mail (for smaller datasets).
The Datasets API consists of multiple endpoints which are divided into logical groupings. As of October 2020, the following groupings of Datasets API endpoints are available:
Each of these differ of course in what data is available in them, but also for what granularity the data is delivered.
There is also an "Experimental" datasets endpoint group which contain legacy endpoints, or brand new API endpoints not ready for production. We discourage the use of these in production, since they may be changed or removed at any moment. When API endpoints in the this group are ready for production, they will be moved to an appropriate Datasets API endpoint group.
In addition to the datasets endpoint groups there are some system specific endpoint groups:
These are there to test your connection, get information about your client, or to set up custom collections of companies, which we call company batches.
The following section describes how to use the API. It will guide you through the implementation of a basic script for downloading datasets to files on disk.
At Enin we use python as our primary backend programming language, so examples will be in that language. If you'd like to get help with integration with your own tool or language, we are more than happy to guide you through the process. If we are familiar with your tool or programming language we can even help port the examples. That said, all operations here are in essence just HTTPS requests, so if your tool or programming language supports this you should be able to consume the data with little effort.
Only the requests library will be
an external dependency. It can be installed via pip
:
pip3 install requests
Now that our prerequisits are in order we will start our python script.
Create a file named datasets_script.py
, and include the following
imports:
import requests
import json
from datetime import datetime, timedelta
These imports will be used later. Continue to build on this script as we move along. Also we'll be using a custom helper function to print some objects.
def print_obj(obj):
print(json.dumps(obj, indent=4, ensure_ascii=False))
You can alternatively just use the built in print()
function or the
pprint
module.
Note that we are using basic python scripting here without bells-and-whistles like a main entry point, or classes, or methods, or functions. This is to keep the code clear for users who aren't familiar with python's idioms. You should of course use a appropriate coding structure for your purposes. At some later time, we will include a properly designed python client, which can be used directly.
Now that we have a script file, lets focus on authentication, i.e., the process of us knowing that you are really who you are saying you are.
Authentication is done with expirable Access Tokens or Basic HTTP Authentication. We recommend using tokens for production systems. In the future we will require IP whitelisting if you use basic authentication.
Note: This tutorial is self contained, so you should be able to just follow along, including basic authentication process. But if you need extra insight, or just need some help, you can follow our getting started documentation, or just contact us on team@enin.ai or the Enin External Slack if we have set an account up for you.
Now, time to make our first request. Let's check that you can reach the Datasets API at all - without authentication:
system_status = requests.get("https://api.enin.ai/datasets/v1/system-status").json()
print_obj(system_status)
This should print something like the following:
{
"message": "Enin Dataset API is operational.",
"python_version": "3.6.9 (default, Nov 7 2019, 10:44:02) [GCC 8.3.0]"
}
Next, let's include authentication. For this tutorial we will be using
basic authentication, i.e., just a password and username. These are
called Basic Auth Client ID
and Basic Auth Client Secret
in your
provided credentials file. If you haven't received a credentials file
yet, then either contact us at team@enin.ai or
speak to someone at your organization which has gotten the credentials
file.
Please take care to save the credentials file properly, i.e., encrypted
and inaccessible to other parties. When using the credentials in your
software remember to always use best practice secret handling. In our
example we will be storing the credentials as a read restricted json
file named .auth.json
located next to the python script
datasets_script.py
. You can technically hard code the client id
and secret in your source code, however, we highly discourage it.
The .auth.json
file should contain the following json data where
YOUR_CLIENT_ID
and YOUR_CLIENT_SECRET
are replaced appropriately:
{
"client_id": "YOUR_CLIENT_ID",
"client_secret": "YOUR_CLIENT_SECRET"
}
If you are running on Linux/Ubuntu you can restrict this file to your user by running:
sudo chmod 600 .auth.json
If you are using a source control system like
Git, make sure to not add this file as source
code. In git you can do this by adding the following line to the end of
your .ignorefile
:
.auth.json
Now, lets load the authentication information from file.
with open('.auth.json') as file:
auth_json = json.load(file)
client_id = auth_json['client_id']
client_secret = auth_json['client_secret']
auth = (client_id, client_secret)
Next, we will try to access an authentication guarded endpoint.
auth_status = requests.get(
"https://api.enin.ai/datasets/v1/auth-status",
auth=auth
).json()
print_obj(auth_status)
If all goes well, then this should print:
{'message': 'You are authenticated.'}
Let's have a look at our API client identity:
api_client_identity = requests.get(
"https://api.enin.ai/datasets/v1/api-client-identity",
auth=auth,
).json()
print_obj(api_client_identity)
This will give summary of some basic identity information about your API Client.
An expanded version of this endpoint is the "composite" version of
this endpoint, api-client-identity-composite
. Let's have a look.
api_client_identity_composite = requests.get(
"https://api.enin.ai/datasets/v1/api-client-identity-composite",
auth=auth,
).json()
print_obj(api_client_identity_composite)
This should print out something like the following:
[
{
"api_client": {
"uuid": ...,
"company_uuid: "b1ba011f-6350-45b3-9b0a-f02c07c0cff5,
...
},
"api_client_identity": {
"uuid": ...,
"company_uuid: "b1ba011f-6350-45b3-9b0a-f02c07c0cff5",
...
},
"app_customer": {
"app_url": "https://app.enin.ai/settings/customer/b1ba011f-6350-45b3-9b0a-f02c07c0cff5",
"app_user_email": ...,
"app_user_uuid": ...,
"desensitize_flag": 0,
"long_name": "Bank of Testing",
"name": "Test Bank",
"picture": "https://i.imgur.com/oXbiJYz.png",
"slug": "test",
"uuid": "b1ba011f-6350-45b3-9b0a-f02c07c0cff5"
},
"app_url": "https://app.enin.ai/settings/customer/b1ba011f-6350-45b3-9b0a-f02c07c0cff5",
"uuid": ...
}
]
Notice that we still get the api_client_identity
entity, but now
also api_client
and app_customer
entities. This is a general pattern
you will find in the Enin APIs. Any time you see -composite
in an
endpoint, think of it as a version of an existing "base" entity which
has been expanded to include more information, yet still retains the granularity of the base entity. We call this base entity the "subject"
entity of the composite.
Based on the information in the app_customer
entity it seems we are part
of the Bank of Testing
customer organization. Let's see if there are
more API clients registered for this customer. The uuid
ref of the
app_customer
entity is b1ba011f-6350-45b3-9b0a-f02c07c0cff5
, and
the customer slug
is test
. We could use either to query for all of
this customer's api_client_identity
entities. Let's use the uuid
:
app_customer_api_client_identity_composite = requests.get(
"https://api.enin.ai/datasets/v1/app-customer/b1ba011f-6350-45b3-9b0a-f02c07c0cff5/api-client-identity-composite",
auth=auth,
).json()
print_obj(app_customer_api_client_identity_composite)
This returned just the same object as before. That means this is the only API Client for this customer organization. But if there was more they would be listed.
Note: There are few to no secrets internally to one customer organization. Any application user or API client can see most if not all information about their customer organization. If you need chinese walls, or otherwise separate information between multiple members of a customer organization, then we will help you set up a sister customer organization.