How to integrate Vizro with Kedro Data Catalog
This page describes how to integrate Vizro with Kedro, an open-source Python framework to create reproducible, maintainable, and modular data science code. For Pandas datasets registered in a Kedro data catalog, Vizro provides a convenient way to visualize them.
Installation
If you already have Kedro installed then you do not need to install any extra dependencies. If you do not have Kedro installed then you should run:
Use datasets from the Kedro Data Catalog
vizro.integrations.kedro
provides functions to help generate and process a Kedro Data Catalog. Given a Kedro Data Catalog catalog
, the general pattern to add datasets into the Vizro data manager is:
from vizro.integrations import kedro as kedro_integration
from vizro.managers import data_manager
for dataset_name, dataset in kedro_integration.datasets_from_catalog(catalog).items():
data_manager[dataset_name] = dataset
This imports all datasets of type kedro_datasets.pandas
from the Kedro catalog
into the Vizro data_manager
.
The catalog
variable may have been created in a number of different ways:
- Kedro project path. Vizro exposes a helper function
vizro.integrations.kedro.catalog_from_project
to generate acatalog
given the path to a Kedro project. - Kedro Jupyter session. This automatically exposes
catalog
. - Data Catalog configuration file (
catalog.yaml
). This can create acatalog
entirely independently of a Kedro project usingkedro.io.DataCatalog.from_config
.
The full code for these different cases is given below.
Import a Kedro Data Catalog into the Vizro data manager
from kedro.io import DataCatalog
import yaml
from vizro.integrations import kedro as kedro_integration
from vizro.managers import data_manager
catalog = DataCatalog.from_config(yaml.safe_load(Path("catalog.yaml").read_text(encoding="utf-8")))
for dataset_name, dataset in kedro_integration.datasets_from_catalog(catalog).items():
data_manager[dataset_name] = dataset