First, lets talk about the dataset endpoints in general.
Things you can typically do with each endpoint includes:
csv
, json
, or jsonl
Each of these operations are enabled through query strings.
Before we get started with examples, it might be useful to understand our API meta data design principles. One integral API design philosophy is that the API itself seldom uses individual fields of data, but rather full objects we call Entities. This means you will seldom se custom entities which merge fields from multiple sources, instead you'll find entities of one type or larger composites of entities.
Take accounts data. The most basic Entity for accounts is the AccountsEntity
, it hardly has any information. It just says whether
there is an account at all. It is perfectly possible for us to know
there are accounts for a company without us yet having fetched the
associated data. As of May 2020, the AccountsEntity
is defined with
the following fields/meta data:
{
"uuid": String,
"company_uuid": String,
"accounting_year": Integer,
"accounting_announcement_date": Date,
"accounting_schema": String,
"accounting_from_date": Date,
"accounting_to_date": Date,
"accounts_type_uuid": String,
"app_url": String,
}
Using the Datasets API, this data can simply be fetched from https://api.enin.ai/datasets/v1/dataset/accounts
using an HTTP
GET
request. Without any extra parameters this will give you all the entities available of this kind. Of course you might want to filter this,
and we will get to that in the next sections. But another glaring issue
is that the data you get from that endpoint is far from enough to do
anything useful with. So, we of course have other associated entities
like AccountsBalanceSheetEntity
and AccountsIncomeStatementEntity
with their respective fields.
It would be very cumbersome if you had to fetch each of these
individually. This is where composites enter the scene. Composites are
simply multiple entities merged into one larger entity at the granularity
of the entity in focus, i.e., the subject of the composite. Let's look
at what an AccountsCompositeEntity
would look like:
{
"accounts": AccountsEntity,
"accounts_highlights": AccountsHighlightsEntity,
"accounts_type": AccountsTypeEntity,
"accounts_income_statement": AccountsIncomeStatementEntity,
"accounts_balance_sheet": AccountsBalanceSheetEntity,
"company": CompanyEntity,
}
Here you see each element is it's own entity. These have their own fields/metadata, so let's expand the first two to see what is under the
hood, i.e., let's expand AccountsEntity
and AccountsHighlightsEntity
.
{
"accounts": {
"uuid": String,
"company_uuid": String,
"accounting_year": Integer,
"accounting_announcement_date": Date,
"accounting_schema": String,
"accounting_from_date": Date,
"accounting_to_date": Date,
"accounts_type_uuid": String,
"app_url": String,
},
"accounts_highlights": {
"uuid", String,
"accounts_uuid", String,
"income_statement__operating_revenue", Float,
"income_statement__ebitda", Float,
"income_statement__ebit", Float,
"income_statement__ordinary_result_after_taxes", Float,
"income_statement__net_income", Float,
"balance__intangible_assets", Float,
"balance__fixed_assets", Float,
"balance__cash_and_deposits", Float,
"balance__total_current_assets", Float,
"balance__total_assets", Float,
"balance__equity", Float,
"balance__liabilities", Float,
"current_ratio", Float,
"quick_ratio", Float,
"return_on_assets", Float,
"profit_margin", Float,
"equity_profitability", Float,
"equity_ratio", Float,
"debt_ratio", Float,
"income_statement__ebt", Float,
"income_statement__currency_code", String,
"accounting_year", Integer,
"app_url", String,
},
"accounts_type": AccountsTypeEntity,
"accounts_income_statement": AccountsIncomeStatementEntity,
"accounts_balance_sheet": AccountsBalanceSheetEntity,
"company": CompanyEntity,
}
To download this object you'd do a HTTP GET
request to https://api.enin.ai/datasets/v1/dataset/accounts-composite
.
As you can see, there is a lot of fields. The sheer volume of meta data in our system is why we seldom operate and manage individual fields. We define the meta data once, centrally, and reuse that definition across our platform. If we had to manage them individually all the time, Enin would not be doing much more than meta data curation. That said, as a consumer of our data you do want to manage the fields you are interested in, and not ALL data ALWAYS. Because of this we have created some tools to be able to select fields and filter entries to your needs. This will be the focus in the next sections.
While discussing entities, we've already looked at some basic API
endpoints, now let's be explicit about how to use them. We will be
fetching entities of type CompanyEntity
and CompanyCompositeEntity
,
from the endpoints
https://api.enin.ai/datasets/v1/dataset/company
and
https://api.enin.ai/datasets/v1/dataset/company-composite
,
respectively.
Let's start by requesting 3 entities of type CompanyEntity
. You can
limit the number of entries returned to 3 by adding the query string
?limit=3
:
companies = requests.get(
"https://api.enin.ai/datasets/v1/dataset/company?limit=3",
auth=auth,
).json()
print_obj(companies)
By default this could print something like the following:
[
{
"insert_timestamp": "2019-08-05T00:18:08.349297+00:00",
"name": "EIENDOMSMEGLER 1 HEDMARK EIENDOM AS AVD GJØVIK",
"org_nr": "972150797",
"org_nr_schema": "NO",
"update_timestamp": "2019-08-05T00:18:08.349297+00:00",
"uuid": "14d39ef2-d1a1-46b9-8571-440d23dbb6d0"
},
{
"insert_timestamp": "2019-07-24T07:34:44.004922+00:00",
"name": "SPAREBANKEN SOGN OG FJORDANE AVD FLORØ",
"org_nr": "973167529",
"org_nr_schema": "NO",
"update_timestamp": "2019-07-24T07:34:44.004922+00:00",
"uuid": "b5b222fa-a9fe-4c04-bc9f-c9a9f9c32f44"
},
{
"insert_timestamp": "2018-10-16T12:33:39.528266+00:00",
"name": "Unknown Name",
"org_nr": "927822334",
"org_nr_schema": "NO",
"update_timestamp": "2018-10-16T12:33:39.528266+00:00",
"uuid": "59866d11-405e-44f2-9c70-f6345b2c9e4f"
}
]
Notice that this returns valid JSON. If you are downloading few entries and the data fits into memory, then this is a good format to use, and there are plenty of libraries which handle it nicely. However, JSON is a little annoying to work with if it gets large. The fact that there are start and end square brackets, and the fact that there are N-1 number of commas delimiting each entity makes it hard to stream. There are of course tools which handle this, but to make things easier we also support the JSONL format, which has one JSON object (entity) per line, and CSV which is has the same feature and is a lot more compact (doesn't repeat the field names all the time).
Let's try JSONL:
companies = requests.get(
"https://api.enin.ai/datasets/v1/dataset/company?limit=3&file_type=jsonl",
auth=auth,
).content.decode()
print(companies)
This outputs:
{"name": "EIENDOMSMEGLER 1 HEDMARK EIENDOM AS AVD GJØVIK", "uuid": "14d39ef2-d1a1-46b9-8571 ... }
{"name": "SPAREBANKEN SOGN OG FJORDANE AVD FLORØ", "uuid": "b5b222fa-a9fe-4c04-bc9f-c9a9f9c ... }
{"name": "Unknown Name", "uuid": "59866d11-405e-44f2-9c70-f6345b2c9e4f", "org_nr": "9278223 ... }
If you would like to stream this using python you could do:
companies = requests.get(
"https://api.enin.ai/datasets/v1/dataset/company?limit=3&file_type=jsonl",
auth=auth,
stream=True
)
for line in companies.iter_lines():
payload = json.loads(line.decode())
print_obj(payload)
A similar strategy can work with other tools as well.
Fetching the same file as CSV can be done as follows:
companies = requests.get(
"https://api.enin.ai/datasets/v1/dataset/company?limit=3&file_type=csv",
auth=auth,
).content.decode()
print(companies)
... which prints:
uuid,name,org_nr,update_timestamp,insert_timestamp,org_nr_schema
14d39ef2-d1a1-46b9-8571-440d23dbb6d0,EIENDOMSMEGLER 1 HEDMARK EIENDOM AS AVD GJØVIK,9721507 ...
b5b222fa-a9fe-4c04-bc9f-c9a9f9c32f44,SPAREBANKEN SOGN OG FJORDANE AVD FLORØ,973167529,2019- ...
59866d11-405e-44f2-9c70-f6345b2c9e4f,Unknown Name,927822334,2018-10-16 12:33:39.528266+00,2 ...
When fetching composites things change slightly. With a composite you will notice that the entries are nested one level deeper. The JSON objects are nested one more level, and the CSV field names uses dot notation the indicate deeper structures.
Let's fetch the JSON version of CompanyCompositeEntity
:
company_composites = requests.get(
"https://api.enin.ai/datasets/v1/dataset/company-composite?limit=3",
auth=auth,
).json()
print_obj(company_composites)
This will give quite a huge object:
[
{
"company": {
"insert_timestamp": "2018-04-27T16:33:37.237567+00:00",
"name": "AMUND BROMSTAD",
"org_nr": "991665595",
"org_nr_schema": "NO",
"update_timestamp": "2019-04-23T00:37:00.033835+00:00",
"uuid": "000004ff-c89f-405d-b8dc-fcc1bc293019"
},
"organization_type": {
"insert_timestamp": "2018-05-16T13:36:29.145604+00:00",
"org_nr_schema": "NO",
"organization_type_code": "FLI",
"organization_type_description": "Forening/lag/innretning",
"update_timestamp": "2020-05-11T05:18:02.127477+00:00",
"uuid": "dbccefba-f227-568d-9933-7084395aba98"
},
"company_affiliation_accountant": { ...},
"company_affiliation_auditor": { ... },
"company_affiliation_type_accountant": { ... },
"company_affiliation_type_auditor": { ... },
"company_details": { ... },
"company_location_business_address": { ... },
"company_nace_code_primary": { ... },
"company_nace_code_secondary": { ... },
"company_nace_code_tertiary": { ... },
"geo_location_business_address": { ... },
"nace_code_primary": { ... },
"nace_code_secondary": { ... },
"nace_code_tertiary": { ... }
{
"company": {
"name": "SYKKYLVEN GATEBILKLUBB",
...
The same query with file_type=csv
as follows:
company_composites = requests.get(
"https://api.enin.ai/datasets/v1/dataset/company-composite?limit=3&file_type=csv",
auth=auth,
).content.decode()
print(company_composites)
... you will get:
company.uuid,company.name,company.org_nr,company.update_timestamp,company.insert_timestamp, ...
000004ff-c89f-405d-b8dc-fcc1bc293019,AMUND BROMSTAD,991665595,2019-04-23 00:37:00.033835+00 ...
0000075d-9a33-4d17-be6f-3adda98ebcae,SYKKYLVEN GATEBILKLUBB,912801535,2020-05-11 05:16:45.9 ...
00000d51-acaf-4db6-be73-13c727a57f9d,MOLIN MASKIN & TRANSPORT MARIUS TOBIASSEN,913966082,20 ...
Notice how company.uuid
is nested using dot notation. This is will
be relevant later when we select fields, and filter on field data.