Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Deploy Elavon SFTP ingest server into production (#1695)
* docs(datasets and tables): added and revamped RT dataset section and docs with links out to dbt docs * docs(datasets and tables): added highlighting to important areas in dataset docs * (portfolio docs): add notebook tips * (portfolio docs): change order of sections * (portfolio docs): more description about decimals and rounding * (portfolio docs): fix formatting * (portfolio docs): fix formatting * start on gtfsrt-v2 * (portfolio_docs): fix typo * docs: testing broken action * switch rt node pool to c2 instances * get new calitp version * import gcs models from calitp-py! * airtable: add macro TODO * airtable: gtfs datasets mart & staging updates * airtable: reenable airtable warehouse resources * airtable: use correct incoming id names in staging tables * airtable: get tts actually working with prior updates * airtable: gtfs service data * airtable: add provider/gtfs table -- fixes #1487 * airtable: clean up some ctes * airtable: clean up references based on some schema changes * airtable: add relationship tests for foreign keys in mart * airtable: start renaming int to base * airtable: refactor staging tables to be historical; refactor get latest macro to enable daily extract selection * airtable: convert staging to views rather than tables * airtable: convert intermediate mapping tables to base * always compile, but only check dbt run success after docs/metabase * run tests even if run failed * airtable: define key as metabase PK * airtable: add equal row count tests for models with id mapping * airtable: rename map to bridge * update poetry.lock for dbt-metabase * airtable: latest-only-ify bridge tables * missed a couple * airtable: make mart latest-only * airtable: refactor dim service components * airtable: specify metabase FK columns * airtable: new fields & tables to address #1630 * airtable: make bridge tables date-aware and assorted small fixes * get us going! * airtable: address failing dbt tests -- minor tweaks * airtable: more failing dbt tests * airtable: refactor service components to handle duplicates * airtable: fix legacy airtable source definition to reference views * airtable: remove redundant metabase FK metadata * airtable: fix test syntax * airtable: use QUALIFY to simplify ranked queries * fix: make airtable gcs operator use timestamps rather than time string * fix(timestamp partitions): update calitp version to get schedule partition updates * warehouse (payments): migrated payments_views_staging cleaned dags to models as well as validation tables to tests * use new calitp version * fix(timestamp partitions): explicitly use isoformat string * style: rename CTEs to be more specific * farm surrogate key macro: coalesce nulls in macro itself * add notebook used to re-name a partition * chore: remove pyup config file no longer in use * chore: remove pyup ignore statement * airtable: use ts instead of time * add airtable mart to list of things synced to metabase * update metabase database names again * warehouse(payments_views_staging): split yml files into staging and source, added documentation for cleaned files, deleted old validation tables * warehouse(payments_views_staging): added generic tests, added composite unique tests from dbt_packages, added docs file with references, materialized staging tables as views * warehouse(payments_views_staging): added configuration to persist singular tests as tables in the warehouse * warehouse(payments_views): migrated airflow dags for payments views to its own model in dbt, added metadata and generic tests, added dbt references * print message if deploy is not set * round lat/lons, specify 4m accuracy, add new resources * print the documentation file being written * add coord system, disable shapes for now due to size limit * fix(fact daily trips timeout): wip incremental table * update to good stable version of sqlfluff * fix: make fact daily trips incremental -- WIP * pass and/or ignore new rules * linter * fact daily trips: remove dev incremental check * docs: update airtable prod maintenance instructions * docs: add new dags to dependency diagram * docs: add spacing to help w line wrapping * docs: more spaces for line wrapping... * dbt-metabase: update version in poetry; comment out failing relationship tests * warehouse(payments_views): got payments_rides working and migrated, added yml and metadata, added payments_views validation tests and persisted tables, added payments_views_refactored with intermedite tables and got that to work * get new calitp version * import gcs models from calitp-py! * missed a couple * get us going! * fix: make airtable gcs operator use timestamps rather than time string * fix(timestamp partitions): update calitp version to get schedule partition updates * fix(timestamp partitions): explicitly use isoformat string * use new calitp version * start experimenting with task queue options and metrics * get this working and test performance with greenlets * couple more metrics * wip testing with multiple consumers at high volume * start optimizing for lots of small tasks; have to make redis interaction fast * fix key str format * couple more libs * wip * wip on discussed changes * get the keys from environ for now * use new calitp py * print a bit more * we are just gonna get stuff in the env * commit this before I break anything * fmt * bump calitp-py * lint * rename v2 to v3 since 2.X tags already exist * kinda make this runnable * new node pool just dropped * get running in docker compose to kick the tires * start on RT v3 k8s * get the consumer working mostly? * label redis pod appropriately * tell consumer about temp rt secrets * that was dumb * ticker k8s! * set expire time on the huey instance * point consumer at svc account json * avoid pulling the stacktrace in * scrape on 9102 * bump to 16 workers per consumer * bump jupyterhub storage to 32gi * add these back! * add comment * bring in new calitp and fix tick rounding * improve metrics and labels * warehouse(payments): removed payemnts_rides_refactor from yml file * clean up labels * get secrets from secret manager sdk before the consumer starts... * missed this * fix secrets volume and adjust affinities * warehouse(payments): removed the airflow dags for the payments_views that were migrated, as well as the two test tables * warehouse(payments): removed the old intermediate tables from the dbt project yaml file * add content type header to bytes * ugh whitespace * warehouse: fixing linting error * warehouse: fixing linting error again * warehouse(dbt_project): added to-do comments in project config to remind where to move model schemas in the future * fix: update Mountain Transit URL * remove celery and gevent from pyproject deps Co-authored-by: Mjumbe Poe <[email protected]> * we might as well specify huey app name by env as well just in case we end up on the same redis in the future * write to the prod bucket! * create a preprod version and deploy it * run fewer workers in preprod * move pull policies to patches, and only run 1 dev consumer * add redis considerations to readme * docs(datasets and tables): revised informationon dbt docs for views tables based on PR review * docs(datasets and tables): revised for readability * docs(datasets and tables): revised docs information for gtfs schedule based on PR review * docs(datasets and tables): fixed readability * docs(datasets and tables): added new formatting, added gtfs rt dbt docs instructions * docs(datasets and tables): revamped the overview page for datasets and tables * docs(datasets and tables): cleaned up readability * bump version and start adding more logging context * specifically log request errors that do not come from raise_for_status * set v3 image versions separately * bump to 8 workers and improve log formatting * formatting * fix string representation of exception type in logs * bump prod to 3.1 * oops * hotfix version * bump to 30m * warehouse(airflow): deleted the empty payments_views_staging dag directory * warehouse(airflow): deleted dummy_staging airflow task, removed gusty dependencies from other tables that relied on that task * docs(airflow): edited the production dags docs to reflect changes in payments staging views dags * docs(airflow): revised docs based on lauries comment re only listing enfoorced dependencies * Update new-team-member.md Fixed added missing meetings, deleted old meetings. deleted auto-assign * docs(datasets ans tables): reconfigured some pages for readability * docs(datasets and tables): re-reviewed and added clarity * fix (open data): align column publish metadata with open data dictionary -- suppress calitp hash, synthetic keys, and extraction date, add calitp_itp_id and url_number * docs(production maintenance): added n/a for dependencies for payments_views * docs(datasets and tables): created new page with content on how to use dbt docs, added to toc * docs(datasets and tables): removed information on how to navigate dbt docs in favor of the new page created, added info to warehouse schema sections, created dbt project cirectory sections * (analyst_docs): update gcloud commands * fix(open data): make test_metadata attribute optional to account for singular tests * docs(datasets and tables): reformatted for readability and conciseness * docs(datasets and tables): revisions based on Laurie's review * docs(datasets and tables): revised PR to put gtfs views tables used by ckan under the views doc * fix(open data): suppress publishing stop_times because of size limit issue * agencies.yml: update FCRTA and add Escalon Transit * agencies.yml: rename escalon transit to etrans * fix(airflow/gtfs_loader): replace non-utf-8 characters * feat(airtable): add new columns per request #1674 * fix(airtable data): address review comments PR #1677 * fix: add WeHo RT URLs * fix(ckan publishing): only add columns to data dictionary if they don't have publish.ignore set * update calitp py and change log * make docker compose work * specify buckets and bump version in dev * now do prod * change logging * add weho key * bump gtfs rt v3 version * bump calitp py * deploy new image to dev * get dev and prod working with bucket env vars * bump calitp py and expire cache every 5 minutes * deploy new cache clearing to prod/dev * make sure calitp is updated, load secrets in ticker too * fix docker compose, use new flags, deploy new image to dev * bump prod * add airtable age metric, bump version, scrape ticker * delete experimental fact_daily_trips_inc incremental table that was not functioning correctly (#1681) * docs: correct Transit Technology Stacks title (#1565) The Transit Technology Stacks header was not properly being linked to in the overview table. This fixes that. * fix: update GRaaS URLs (#1690) * New schedule pipeline validation job (#1648) * wip on validation in new schedule pipeline * bring in stuff from calitp storage, work on saving validations/outcomes * wip getting this working * use new calitp prerelease, fix filenames/content, remove break * oops * working! * update lockfile * unzip/validate schedule dag * remove this * bring in latest calitp-py * extra print * pass env vars into pod * fix lint * add readme * bring in latest calitp * fix print and formatting * bring the outcome-only classes over, and use env var for bucket * filter out nones for RT airtable records * bring in latest calitp py * get latest calitp * use new env var and rename validation job results * start updating airflow with new calitp py and using bucket env vars * test schedule downloader with new calitp * new calitp * handle new calitp, better logging * add env vars for new calitp * put prefix_bucket back for parse_and_validate_rt and document env var configuration * comments * use new version of caltip py with good gcsfs (#1693) * use new version of caltip py with good gcsfs * use the regular release * docs(agency): adding reference table for analysts to define agency, reference for pre-commit hooks (#1430) * docs(agency): adding reference table for analysts to define agency in their research * docs(agency): fixed table formatting error * docs(agency): fixed table formatting error plus pre-commit hooks * docs(pre-commit hooks): added information for using and troubleshooting pre-commit hooks * docs: formatting errors, added missing capitalization * docs: formatting table with list * docs: formatting table with no line break - attempt 1 * docs: clarified language and spacing in table * docs: clarified language in table * docs: removing extra information from agency table * docs: removing extra information from agency table pt 2 * docs: removing extra information from agency table pt 3 * docs: reworked table to include gtfs-provider-service relationships * docs: added space for the gtfs provider's services section * docs: added space for the gtfs provider's services section syntax corrections * docs: added space for the gtfs provider's services section syntax corrections again * docs: clarified information arounf gtfs provider relationships * docs: clarified information around gtfs provider relationships and intro content * docs: agency table revisions based on call with E * docs(agency reference): incorporated E's feedback in the copy, added warehouse table instead of airtable table * docs(agency reference): reformatted table * docs(warehouse): added new table information for analyst agency reference now that the airtable migration is complete and the table was created. added css styling to prevent table scrolling * docs: renamed python library file h1 to be more intuitive * docs(conf): added comments explaining the added css preventing horizontal scroll in markdown tables * docs(add to what_is_agency) * docs(warehouse): fixed some typos, errors, and formatting issues Co-authored-by: natam1 <[email protected]> Co-authored-by: Charles Costanzo <[email protected]> * we also have to pin a specific fsspec version directly in the requirements (#1694) * Create SFTP ingest component for Elavon data (#1692) * kubernetes: sftp-ingest-elavon: add server component * kubernetes: sftp-server: add sshd configuration This enables functionality like chroot'd logins and disabling of shell logins. * kubernetes: sftp-server: add readinessProbe Since the container is essentially built at startup, there is a sizeable time delta between container startup and ssh server startup. This addition helps the operator easily detect when installation is complete and the service is running. * kubernetes: sftp-server: add cluster service This enables cluster workloads to login using a DNS names. * kubernetes: sftp-server: refactor bootstrap script for better DRY * kubernetes: prod-sftp-ingest-elavon: create production localization * kubernetes: prod-sftp-ingest-elavon: add internet-service.yaml This exposes the SFTP port for inbound connections from the vendor. * ci: prod-sftp-ingest-elavon.env: enable prod deployment Co-authored-by: Charlie Costanzo <[email protected]> Co-authored-by: tiffanychu90 <[email protected]> Co-authored-by: Andrew Vaccaro <[email protected]> Co-authored-by: Andrew Vaccaro <[email protected]> Co-authored-by: Laurie Merrell <[email protected]> Co-authored-by: Kegan Maher <[email protected]> Co-authored-by: Laurie <[email protected]> Co-authored-by: evansiroky <[email protected]> Co-authored-by: Mjumbe Poe <[email protected]> Co-authored-by: tiffanychu90 <[email protected]> Co-authored-by: natam1 <[email protected]> Co-authored-by: Charles Costanzo <[email protected]> Co-authored-by: Github Action build-release-candidate <runner@fv-az123-804>
- Loading branch information