Some filters are special for particular endpoints, but in general the Datasets API have two main types of filters. Those which can be applied to data fields of the entities you are downloading, or based on what we call company batches.
To spice things up, let's switch to accounting data using the
https://api.enin.ai/datasets/v1/dataset/accounts-composite
API endpoint. This endpoint is a composite endpoint
with a wealth of information regarding company accounts and financial data of companies. The accounts
entity composition looks as follows:
{
"accounts": AccountsEntity,
"accounts_type": AccountsTypeEntity,
"company": CompanyEntity,
"company_batch_mapping": CompanyBatchMappingEntity,
"accounts_highlights": AccountsHighlightsEntity,
"accounts_income_statement": AccountsIncomeStatementEntity,
"accounts_balance_sheet": AccountsBalanceSheetEntity,
}
This has the granularity of each accounting report, typically one each year, and will have duplicate companies if there are multiple accounts for a company.
Let's start by fetching info about the revenue of companies:
accounts_composites = requests.get(
"https://api.enin.ai/datasets/v1/dataset/accounts-composite",
params={
"limit": 3,
"response_file_type": "csv",
"keep_only_fields": ','.join(
[
"company.name",
"company.org_nr",
"accounts.accounting_year",
"accounts_income_statement.total_operating_income",
]
)
},
auth=auth,
).content.decode()
print(accounts_composites)
This prints the following:
accounts.accounting_year,company.name,company.org_nr,accounts_income_statement.total_operat ...
2000,Rana Maskinstasjon A/S,832071072,1687000.0
2018,OMIT AS,916648464,0
2018,HALSEN BORETTSLAG,952825240,998000
Those results seem very arbitrary. And if you removed the limit you'd get get 4814791 entries.
Filtering on entity fields is done using the previously introduced dot notation, in addition to an operator for more complex queries. For the most basic equality filtering you can simply use the dot notation as a query argument. The not notation itself indicates to the endpoint that you want to filter.
Let's try it out by filtering on year:
accounts_composites = requests.get(
"https://api.enin.ai/datasets/v1/dataset/accounts-composite",
params={
"limit": 3,
"response_file_type": "csv",
"accounts.accounting_year": 2018,
"keep_only_fields": ','.join(
[
"company.name",
"company.org_nr",
"accounts.accounting_year",
"accounts_income_statement.total_operating_income",
]
)
},
auth=auth,
).content.decode()
print(accounts_composites)
In return we get:
accounts.accounting_year,company.name,company.org_nr,accounts_income_statement.total_operat ...
2018,TRAIN ING. AS,997924231,
2018,HELLY INVEST AS,990759413,
2018,MOMENTUM RETAIL NORWAY AS,913104358,12385000
It seems to have worked. Adding more filters narrows down the dataset, as it assumes the filters
are combined using the boolean AND
operators. Downloading this without a limit gives
349125 entries. You can apply the filters on any connected entity.
Let's fetch data on a particular company, but this time also print out the generated URL just to highlight how the filters look when URL encoded, and not just when they are in object form.
accounts_composites_response = requests.get(
"https://api.enin.ai/datasets/v1/dataset/accounts-composite",
params={
"limit": 3,
"response_file_type": "csv",
"accounts.accounting_year": 2018,
"company.org_nr_schema": 'NO',
"company.org_nr": '812750062',
"keep_only_fields": ','.join(
[
"company.name",
"company.org_nr",
"accounts.accounting_year",
"accounts_income_statement.total_operating_income",
]
)
},
auth=auth,
)
accounts_composites = accounts_composites_response.content.decode()
print(accounts_composites_response.url)
print(accounts_composites)
The first print statement returns something like the following:
https://api.enin.ai/datasets/v1/dataset/accounts-composite?
limit=3&
file_type=csv&
accounts.accounting_year=2018&
company.org_nr_schema=NO&
company.org_nr=812750062&
keep_only_fields=
company.name,
company.org_nr,
accounts.accounting_year,
accounts_income_statement.total_operating_income
(I've taken the liberty to format the above URL for clarity, the real printout has all commas
encoded as %2C
and there are no white spaces.)
As you can see parameters which use the dot notation are filters.
The second print statement gives the following:
accounts.accounting_year,company.name,company.org_nr,accounts_income_statement.total_operat ...
2018,Bergene Holm AS,812750062,1619605000
2018,Bergene Holm AS,812750062,1619310000
If you are acquainted with Norwegian accounting data, the fact that two entries were returned
for one company on the same year might be confusing. Generally speaking company only returns
one official accounts report to the authorities per year. The there are two here is because one
is the corporate accounts and one is the company accounts for the same organization. Let's
illustrate this by adding accounts_type.accounts_type_key
to the keep_only_fields
parameter.
This gives:
accounts.accounting_year,accounts_type.accounts_type_key,company.name,company.org_nr,accoun ...
2018,annual_corporate_group_accounts,Bergene Holm AS,812750062,1619605000
2018,annual_company_accounts,Bergene Holm AS,812750062,1619310000
As we can see these are divided by annual_corporate_group_accounts
and annual_company_accounts
.
Adding accounts_type.accounts_type_key=annual_company_accounts
as a dot notation filter query
parameter brings this down to only one entry, as expected:
accounts.accounting_year,accounts_type.accounts_type_key,company.name,company.org_nr,accoun ...
2018,annual_company_accounts,Bergene Holm AS,812750062,1619310000
We have so far only handled equality filters. These are the default when using dot notation filters. In fact, the notation we have used so far is only the shorthand version of the equality filter.
The filter accounts_type.accounts_type_key=annual_company_accounts
writen using an explicit operator
is done as follows: accounts_type.accounts_type_key=EQ:annual_company_accounts
. Both give identical
results. But the latter can be changed to work with other operators, especially usefull when filtering
numeric values.
The general syntax for dot notation filtering is as follows:
<entity>.<field>=<negation><operator>:<filter_value>
<entity>
is as expected, a named collection of fields. The database equivalent is a table.
<field>
is also as expected, a named value as part of an entity. The database equivalent is a column.
<negation>
can either be tilde (~
) or empty string. If used it negates the whole operation. It
enables operations like "not in" or "is not null".
<operator>
can be one of the following:
Operator | Equivalent SQL | Description |
---|---|---|
GTE |
>= |
Greater than or equal. |
LTE |
<= |
Less than or equal. |
GT |
> |
Greater than. |
LT |
< |
Less than. |
EQ |
= |
Equal. |
IN |
in |
Checks equality on a comma separated list of values. |
IS |
is |
Checks identity. Only works for empty string or null . |
ILIKE |
ilike |
Case-insensitive partial text match. |
LIKE |
like |
Case-sensitive partial text match. |
The ILIKE
and LIKE
operators use %
as a wild card of any length, and _
as a single
character wild card. You can escape these by using \%
and \_
Finally, <filter_value>
is the value you want the filtering to be done based on. Must be a
supplied value, and can not reference other fields.
Let's try this out by filtering some numeric value in the accounts data. Let's find all accounts with more than 50 billion in revenue in 2018.
URL encoded you can write this as:
accounts.accounting_year=EQ:2018&accounts_income_statement.total_operating_income=GT:50000000000
Let's do this in python:
accounts = requests.get(
"https://api.enin.ai/datasets/v1/dataset/accounts-composite",
params={
"response_file_type": "csv",
"accounts_type.accounts_type_key": "EQ:annual_company_accounts",
"accounts.accounting_year": "EQ:2018",
"accounts_income_statement.total_operating_income": "GT:50000000000",
"keep_only_fields": ','.join(
[
"company.name",
"company.org_nr",
"accounts.accounting_year",
"accounts_income_statement.total_operating_income",
]
)
},
auth=auth,
).content.decode()
print(accounts)
This returns 4 accounts:
accounts.accounting_year,company.name,company.org_nr,accounts_income_statement.total_operat ...
2018,HELSE SØR-ØST RHF,991324968,75744464000
2018,EQUINOR ENERGY AS,990888213,214951445000
2018,EQUINOR ASA,923609016,483083652000
2018,HYDRO ALUMINIUM AS,917537534,52570000000
If we would like another accounting year, say 2008, then then you could use the IN
operator
like this:
accounts.accounting_year=IN:2008,2018
Using this returns 9 accounts including those previously fetched:
accounts.accounting_year,company.name,company.org_nr,accounts_income_statement.total_operat ...
2018,EQUINOR ENERGY AS,990888213,214951445000
2008,EXXONMOBIL EXPLORATION AND PRODUCTION NORWAY AS,914048990,67882000000
2008,ESSO NORGE AS,914803802,55560000000
2008,EQUINOR ENERGY AS,990888213,81474000000
2008,TOTAL E&P NORGE AS,927066440,57122000000
2018,HELSE SØR-ØST RHF,991324968,75744464000
2018,EQUINOR ASA,923609016,483083652000
2008,EQUINOR ASA,923609016,588422000000
2018,HYDRO ALUMINIUM AS,917537534,52570000000
This IN
operator can be usefull for many sort of operations. You can for instance get the above list
directly using their org_nr
with the following query string:
company.org_nr=IN:990888213,914048990,914803802,990888213,927066440,991324968,923609016,923609016,917537534
As a python script you can try like this:
accounts = requests.get(
"https://api.enin.ai/datasets/v1/dataset/accounts-composite",
params={
"response_file_type": "csv",
"accounts_type.accounts_type_key": "EQ:annual_company_accounts",
"accounts.accounting_year": "IN:2008,2018",
"company.org_nr_schema": "EQ:NO",
"company.org_nr": "IN:" + ','.join(
[
"914048990",
"914803802",
"917537534",
"923609016",
"927066440",
"990888213",
"991324968",
]
),
"keep_only_fields": ','.join(
[
"company.name",
"company.org_nr",
"accounts.accounting_year",
"accounts_income_statement.total_operating_income",
]
)
},
auth=auth,
).content.decode()
print(accounts)
This returns all of the previous entries, and a few more where some of the companies didn't have 50 billion NOK revenues:
accounts.accounting_year,company.name,company.org_nr,accounts_income_statement.total_operat ...
2018,EXXONMOBIL EXPLORATION AND PRODUCTION NORWAY AS,914048990,30239000000
2008,EXXONMOBIL EXPLORATION AND PRODUCTION NORWAY AS,914048990,67882000000
2018,ESSO NORGE AS,914803802,31061000000
2008,ESSO NORGE AS,914803802,55560000000
2018,TOTAL E&P NORGE AS,927066440,35091000000
2008,TOTAL E&P NORGE AS,927066440,57122000000
2018,EQUINOR ASA,923609016,483083652000
2008,EQUINOR ASA,923609016,588422000000
2018,HELSE SØR-ØST RHF,991324968,75744464000
2008,HELSE SØR-ØST RHF,991324968,48986731000
2018,EQUINOR ENERGY AS,990888213,214951445000
2008,EQUINOR ENERGY AS,990888213,81474000000
2018,HYDRO ALUMINIUM AS,917537534,52570000000
2008,HYDRO ALUMINIUM AS,917537534,48952000000
This is all and good, but what happens if we want 10000 companies. In that case you will meet a technical limitation where the URL is capped to a fixed number of characters. To handle this we use company batches.