There is a wealth of fields available, but you probably don't want all of them in one go. There are
two ways to filter down what fields you get in return. One is using the keep_only_fields
query
parameter, the other is to user ignore_fields
. These can be used on individual fields like
company.uuid
or company.org_nr_schema
, or you can apply them to all fields of an entity by
omitting the last part of the dot notation, e.g., you can use just company
as a parameter
value and it will apply to all fields of that entity.
If you would like to ignore some fields or entities, then ignore_fields
is your go to parameter.
Say, you don't want the UUID of the CompanyEntity
then you could write the query parameter as
ignore_fields=company.uuid
. You can also specify a list of fields or entities you would like to
ignored by delimiting them by a comma (,
). Say, you would like to also ignore the organization
number schema (typically the country code), then you'd write the query parameter as
ignore_fields=company.uuid,company.org_nr_schema
. Or, say you would like to ignore the company
entity all together, then you could write the query parameter as just ignore_fields=company
.
Let's try this out on the CompanyEntity
endpoint:
companies = requests.get(
"https://api.enin.ai/datasets/v1/dataset/company?limit=3&ignore_fields=company.uuid,company.org_nr_schema",
auth=auth,
).json()
print_obj(companies)
Because the argument list is getting large let's use the requests
library's query parameter functionality.
The above query is identical to:
companies = requests.get(
"https://api.enin.ai/datasets/v1/dataset/company",
params={
"limit": 3,
"ignore_fields": "company.uuid,company.org_nr_schema",
},
auth=auth,
).json()
print_obj(companies)
And, we can also split the ignore_fields
list using python's str.join()
function. This means that the above
is identical to the following:
companies = requests.get(
"https://api.enin.ai/datasets/v1/dataset/company",
params={
"limit": 3,
"ignore_fields": ','.join(
[
"company.uuid",
"company.org_nr_schema",
]
)
},
auth=auth,
).json()
print_obj(companies)
If you are making more advanced calls, as we are about to do, you should consider also parameterize like this in your tool or programming language.
Running the previous code prints the following:
[
{
"name": "EIENDOMSMEGLER 1 HEDMARK EIENDOM AS AVD GJØVIK",
"org_nr": "972150797",
"insert_timestamp": "2019-08-05T00:18:08.349297+00:00",
"update_timestamp": "2019-08-05T00:18:08.349297+00:00"
},
{
"name": "SPAREBANKEN SOGN OG FJORDANE AVD FLORØ",
"org_nr": "973167529",
"insert_timestamp": "2019-07-24T07:34:44.004922+00:00",
"update_timestamp": "2019-07-24T07:34:44.004922+00:00"
},
{
"name": "Unknown Name",
"org_nr": "927822334",
"insert_timestamp": "2018-10-16T12:33:39.528266+00:00",
"update_timestamp": "2018-10-16T12:33:39.528266+00:00"
}
]
Notice how org_nr_schema
and uuid
are missing.
The previous section was all about ignoring fields you aren't interested in. That is nice to have,
but what if you only need a few fields and entities? Listing almost all fields can get cumbersome
fast. This is especially true if you are using composite API endpoints, which tend to have rather
many fields and entities. Also, over time there are bound to be added more fields which would
show up if you only relied on ignore_fields
.
This is where the keep_only_fields
query parameter comes in to play. It ignores all other fields
than those you specified. Let's try it out by fetching CompanyEntity
with it's primary
NaceCodeEntity
using the https://api.enin.ai/datasets/v1/dataset/company-composite
API endpoint, and
say, return it as a CSV file:
company_composites = requests.get(
"https://api.enin.ai/datasets/v1/dataset/company-composite",
params={
"limit": 3,
"response_file_type": "csv",
"keep_only_fields": ','.join(
[
"company",
"nace_code_primary",
]
)
},
auth=auth,
).content.decode()
print(company_composites)
Notice that we just asked for the company and nace code entities, and not particular fields of those fields. This will return the following:
company.uuid,company.name,company.org_nr,company.update_timestamp,company.insert_timestamp, ...
000004ff-c89f-405d-b8dc-fcc1bc293019,AMUND BROMSTAD,991665595,2019-04-23 00:37:00.033835+00 ...
0000075d-9a33-4d17-be6f-3adda98ebcae,SYKKYLVEN GATEBILKLUBB,912801535,2020-05-14 01:03:19.4 ...
00000d51-acaf-4db6-be73-13c727a57f9d,MOLIN MASKIN & TRANSPORT MARIUS TOBIASSEN,913966082,20 ...
That's still a lot of data. We can't even fit it in the output box above. Let's use the ignore field functionality and remove the company UUID and timestamps:
company_composites = requests.get(
"https://api.enin.ai/datasets/v1/dataset/company-composite",
params={
"limit": 3,
"response_file_type": "csv",
"ignore_fields": ','.join(
[
"company.uuid",
"company.update_timestamp",
"company.insert_timestamp",
]
),
"keep_only_fields": ','.join(
[
"company",
"nace_code_primary",
]
)
},
auth=auth,
).content.decode()
print(company_composites)
At least we now can fit some NACE code data into the output box:
company.name,company.org_nr,company.org_nr_schema,nace_code_primary.uuid,nace_code_primary.
AMUND BROMSTAD,991665595,NO,,,,,,,,
SYKKYLVEN GATEBILKLUBB,912801535,NO,840f6832-13de-43ab-9455-93822c3f5717,94.991,5,Aktivitet ...
MOLIN MASKIN & TRANSPORT MARIUS TOBIASSEN,913966082,NO,cc9b075e-ce7f-4255-b894-97ed6e9257b4 ...
Turns out AMUND BROMSTAD
doesn't have a NACE code. You can see that by the empty data
entries. Also we can barely see that SYKKYLVEN GATEBILKLUBB
has the NACE code 94.991
.
Let's just keep only nace_code_primary.nace_code
and nace_code_primary.short_name
, and
also ignore company.org_nr_schema
:
company_composites = requests.get(
"https://api.enin.ai/datasets/v1/dataset/company-composite",
params={
"limit": 3,
"response_file_type": "csv",
"ignore_fields": ','.join(
[
"company.uuid",
"company.org_nr_schema",
"company.update_timestamp",
"company.insert_timestamp",
]
),
"keep_only_fields": ','.join(
[
"company",
"nace_code_primary.nace_code",
"nace_code_primary.short_name",
]
)
},
auth=auth,
).content.decode()
print(company_composites)
Finally, a nice short selection of fields we are interested in:
company.name,company.org_nr,nace_code_primary.nace_code,nace_code_primary.short_name
AMUND BROMSTAD,991665595,,
SYKKYLVEN GATEBILKLUBB,912801535,94.991,Interesseorganisasjoner ellers
MOLIN MASKIN & TRANSPORT MARIUS TOBIASSEN,913966082,43.120,Grunnarbeid
The take away is that you can combine ignore_fields
and keep_only_fields
on
both fields and entities to select exactly the data you need.
Running this without the limit
parameter took 3 minutes and returned 2165983 entries, which
is exactly how many unique organization numbers we have in our system as of May 2020. This,
is a lot. Let's start work on filtering.