{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Building a Simple TEEHR Dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this notebook you will learn how to build a simple TEEHR dataset, export it to a joined parquet file and run a few simple queries against it. This example is intentionally very simple and by no means shows all the functionality of the TEEHR toolsets or approach.\n", "\n", "All of the input data is CSV and GeoJSON files. This is intended to be\n", "the simplest example of how TEEHR can be used." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Import the required packages\n", "import pandas as pd\n", "import geopandas as gpd\n", "import duckdb\n", "import datetime as datetime\n", "\n", "from pathlib import Path\n", "from teehr.classes.duckdb_database import DuckDBDatabase\n", "from teehr.classes.duckdb_joined_parquet import DuckDBJoinedParquet\n", "import teehr.queries.duckdb as tqd\n", "\n", "import holoviews as hv\n", "import geoviews as gv\n", "import hvplot.pandas\n", "import cartopy.crs as ccrs\n", "from holoviews import opts" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "vscode": { "languageId": "shellscript" } }, "outputs": [], "source": [ "# Download example data that we will converted to TEEHR from S3.\n", "!rm -rf ~/teehr/example-1\n", "!aws s3 cp --recursive --no-sign-request s3://ciroh-rti-public-data/teehr-workshop-devcon-2024/workshop-data/example-1 ~/teehr/example-1" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Define the raw data and TEEHR 'dataset' directory locations\n", "RAW_DATA_FILEPATH = Path(Path().home(), \"teehr/example-1/raw\")\n", "TEEHR_BASE = Path(Path.home(), \"teehr/example-1/teehr_base\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# While TEEHR is very flexible with regards to where data is stored and how it is named, \n", "# we have a bit a a standard established. The following sets up the standard folder structure.\n", "# Create folders for each type of TEEHR 'table'\n", "PRIMARY_FILEPATH = Path(TEEHR_BASE, 'primary')\n", "SECONDARY_FILEPATH = Path(TEEHR_BASE, 'secondary')\n", "CROSSWALK_FILEPATH = Path(TEEHR_BASE, 'crosswalk')\n", "GEOMETRY_FILEPATH = Path(TEEHR_BASE, 'geometry')\n", "ATTRIBUTE_FILEPATH = Path(TEEHR_BASE, 'attribute')\n", "JOINED_FILEPATH = Path(TEEHR_BASE, 'joined')\n", "DB_FILEPATH = Path(TEEHR_BASE, 'teehr.db')\n", "\n", "PRIMARY_FILEPATH.mkdir(exist_ok=True, parents=True)\n", "SECONDARY_FILEPATH.mkdir(exist_ok=True, parents=True)\n", "CROSSWALK_FILEPATH.mkdir(exist_ok=True, parents=True)\n", "GEOMETRY_FILEPATH.mkdir(exist_ok=True, parents=True)\n", "ATTRIBUTE_FILEPATH.mkdir(exist_ok=True, parents=True)\n", "JOINED_FILEPATH.mkdir(exist_ok=True, parents=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Look at folder/file structure. Notice the raw data we downloaded as a starting point is \n", "# in 'raw', but the folders in 'teehr_base' are empty still. We will populate them next.\n", "!tree ~/teehr/example-1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Convert data to TEEHR format" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this section we will convert the following data types from CSV or GeoJSON format to TEEHR format.\n", "\n", "