Jump to content

Data Platform/Evaluations/2021 data catalog selection/Rubric/Amundsen

From Wikitech
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Core Service and Dependency Setup

Ingestion Configuration

Progress Status

Perceptions

Outcome

Razzi's take on Amundsen

Pros:

- simple architecture of 3 flask services all in python (as opposed to Datahub using java and python)

- ingestion architecture is simple: python scripts or airflow dags that make http api requests

- "social" ui features, like frequent users and owners

- loose coupling means you can use a relational database as the data store rather than neo4j (https://github.com/amundsen-io/amundsenrds)

Cons:

- seems like the community is losing steam: https://github.com/amundsen-io/amundsen#blog-posts-and-interviews has a flurry of events in 2019/2020 but nothing in 2021

- only supports polling for data updates, unless we also deploy atlas. Push ingest api is on their roadmap

- documentation is somewhat lacking; few ingestion examples, and broken links in docs

- some dependencies are getting out of date: elasticsearch version 6 (v7 was released 2019), nodejs version 12 (v13 was released 2019)

The Amundsen home page running in Docker, after loading their small sample dataset from example/scripts/sample_data_loader.py
Summary of Amundsen from the README on Github.

Amundsen was created by Lyft and is now hosted by the Linux Foundation.