Previous tutorial: 2. Making requests with the Datasets API
There is a wealth of fields available, but you probably don't want all of them in one go. There are
two ways to filter down what fields you get in return. One is using the keep_only_fields query
parameter, the other is to user ignore_fields. These can be used on individual fields like
company.uuid or company.org_nr_schema, or you can apply them to all fields of an entity by
omitting the last part of the dot notation, e.g., you can use just company as a parameter
value and it will apply to all fields of that entity.
If you would like to ignore some fields or entities, then ignore_fields is your go to parameter.
Say, you don't want the UUID of the CompanyEntity then you could write the query parameter as
ignore_fields=company.uuid. You can also specify a list of fields or entities you would like to
ignored by delimiting them by a comma (,). Say, you would like to also ignore the organization
number schema (typically the country code), then you'd write the query parameter as
ignore_fields=company.uuid,company.org_nr_schema. Or, say you would like to ignore the company
entity all together, then you could write the query parameter as just ignore_fields=company.
Let's try this out on the CompanyEntity endpoint:
companies = requests.get(
"https://api.enin.ai/datasets/v1/dataset/company?limit=3&ignore_fields=company.uuid,company.org_nr_schema",
auth=auth,
).json()
print_obj(companies)
Because the argument list is getting large let's use the requests library's query parameter functionality.
The above query is identical to:
companies = requests.get(
"https://api.enin.ai/datasets/v1/dataset/company",
params={
"limit": 3,
"ignore_fields": "company.uuid,company.org_nr_schema",
},
auth=auth,
).json()
print_obj(companies)
And, we can also split the ignore_fields list using python's str.join() function. This means that the above
is identical to the following:
companies = requests.get(
"https://api.enin.ai/datasets/v1/dataset/company",
params={
"limit": 3,
"ignore_fields": ','.join(
[
"company.uuid",
"company.org_nr_schema",
]
)
},
auth=auth,
).json()
print_obj(companies)
If you are making more advanced calls, as we are about to do, you should consider also parameterize like this in your tool or programming language.
Running the previous code prints the following:
[
{
"name": "EIENDOMSMEGLER 1 HEDMARK EIENDOM AS AVD GJØVIK",
"org_nr": "972150797",
"insert_timestamp": "2019-08-05T00:18:08.349297+00:00",
"update_timestamp": "2019-08-05T00:18:08.349297+00:00"
},
{
"name": "SPAREBANKEN SOGN OG FJORDANE AVD FLORØ",
"org_nr": "973167529",
"insert_timestamp": "2019-07-24T07:34:44.004922+00:00",
"update_timestamp": "2019-07-24T07:34:44.004922+00:00"
},
{
"name": "Unknown Name",
"org_nr": "927822334",
"insert_timestamp": "2018-10-16T12:33:39.528266+00:00",
"update_timestamp": "2018-10-16T12:33:39.528266+00:00"
}
]
Notice how org_nr_schema and uuid are missing.
The previous section was all about ignoring fields you aren't interested in. That is nice to have,
but what if you only need a few fields and entities? Listing almost all fields can get cumbersome
fast. This is especially true if you are using composite API endpoints, which tend to have rather
many fields and entities. Also, over time there are bound to be added more fields which would
show up if you only relied on ignore_fields.
This is where the keep_only_fields query parameter comes in to play. It ignores all other fields
than those you specified. Let's try it out by fetching CompanyEntity with it's primary
NaceCodeEntity using the https://api.enin.ai/datasets/v1/dataset/company-composite API endpoint, and
say, return it as a CSV file:
company_composites = requests.get(
"https://api.enin.ai/datasets/v1/dataset/company-composite",
params={
"limit": 3,
"response_file_type": "csv",
"keep_only_fields": ','.join(
[
"company",
"nace_code_primary",
]
)
},
auth=auth,
).content.decode()
print(company_composites)
Notice that we just asked for the company and nace code entities, and not particular fields of those fields. This will return the following:
company.uuid,company.name,company.org_nr,company.update_timestamp,company.insert_timestamp, ...
000004ff-c89f-405d-b8dc-fcc1bc293019,AMUND BROMSTAD,991665595,2019-04-23 00:37:00.033835+00 ...
0000075d-9a33-4d17-be6f-3adda98ebcae,SYKKYLVEN GATEBILKLUBB,912801535,2020-05-14 01:03:19.4 ...
00000d51-acaf-4db6-be73-13c727a57f9d,MOLIN MASKIN & TRANSPORT MARIUS TOBIASSEN,913966082,20 ...
That's still a lot of data. We can't even fit it in the output box above. Let's use the ignore field functionality and remove the company UUID and timestamps:
company_composites = requests.get(
"https://api.enin.ai/datasets/v1/dataset/company-composite",
params={
"limit": 3,
"response_file_type": "csv",
"ignore_fields": ','.join(
[
"company.uuid",
"company.update_timestamp",
"company.insert_timestamp",
]
),
"keep_only_fields": ','.join(
[
"company",
"nace_code_primary",
]
)
},
auth=auth,
).content.decode()
print(company_composites)
At least we now can fit some NACE code data into the output box:
company.name,company.org_nr,company.org_nr_schema,nace_code_primary.uuid,nace_code_primary.
AMUND BROMSTAD,991665595,NO,,,,,,,,
SYKKYLVEN GATEBILKLUBB,912801535,NO,840f6832-13de-43ab-9455-93822c3f5717,94.991,5,Aktivitet ...
MOLIN MASKIN & TRANSPORT MARIUS TOBIASSEN,913966082,NO,cc9b075e-ce7f-4255-b894-97ed6e9257b4 ...
Turns out AMUND BROMSTAD doesn't have a NACE code. You can see that by the empty data
entries. Also we can barely see that SYKKYLVEN GATEBILKLUBB has the NACE code 94.991.
Let's just keep only nace_code_primary.nace_code and nace_code_primary.short_name, and
also ignore company.org_nr_schema:
company_composites = requests.get(
"https://api.enin.ai/datasets/v1/dataset/company-composite",
params={
"limit": 3,
"response_file_type": "csv",
"ignore_fields": ','.join(
[
"company.uuid",
"company.org_nr_schema",
"company.update_timestamp",
"company.insert_timestamp",
]
),
"keep_only_fields": ','.join(
[
"company",
"nace_code_primary.nace_code",
"nace_code_primary.short_name",
]
)
},
auth=auth,
).content.decode()
print(company_composites)
Finally, a nice short selection of fields we are interested in:
company.name,company.org_nr,nace_code_primary.nace_code,nace_code_primary.short_name
AMUND BROMSTAD,991665595,,
SYKKYLVEN GATEBILKLUBB,912801535,94.991,Interesseorganisasjoner ellers
MOLIN MASKIN & TRANSPORT MARIUS TOBIASSEN,913966082,43.120,Grunnarbeid
The take away is that you can combine ignore_fields and keep_only_fields on
both fields and entities to select exactly the data you need.
Running this without the limit parameter took 3 minutes and returned 2165983 entries, which
is exactly how many unique organization numbers we have in our system as of May 2020. This,
is a lot. Let's start work on filtering.