Skip to content

Commit

Permalink
Deploy elavon SFTP credentials into production (#1749)
Browse files Browse the repository at this point in the history
* airtable: start renaming int to base

* airtable: refactor staging tables to be historical; refactor get latest macro to enable daily extract selection

* airtable: convert staging to views rather than tables

* airtable: convert intermediate mapping tables to base

* always compile, but only check dbt run success after docs/metabase

* run tests even if run failed

* airtable: define key as metabase PK

* airtable: add equal row count tests for models with id mapping

* airtable: rename map to bridge

* update poetry.lock for dbt-metabase

* airtable: latest-only-ify bridge tables

* missed a couple

* airtable: make mart latest-only

* airtable: refactor dim service components

* airtable: specify metabase FK columns

* airtable: new fields & tables to address #1630

* airtable: make bridge tables date-aware and assorted small fixes

* get us going!

* airtable: address failing dbt tests -- minor tweaks

* airtable: more failing dbt tests

* airtable: refactor service components to handle duplicates

* airtable: fix legacy airtable source definition to reference views

* airtable: remove redundant metabase FK metadata

* airtable: fix test syntax

* airtable: use QUALIFY to simplify ranked queries

* fix: make airtable gcs operator use timestamps rather than time string

* fix(timestamp partitions): update calitp version to get schedule partition updates

* warehouse (payments): migrated payments_views_staging cleaned dags to models as well as validation tables to tests

* use new calitp version

* fix(timestamp partitions): explicitly use isoformat string

* style: rename CTEs to be more specific

* farm surrogate key macro: coalesce nulls in macro itself

* add notebook used to re-name a partition

* chore: remove pyup config file

no longer in use

* chore: remove pyup ignore statement

* airtable: use ts instead of time

* add airtable mart to list of things synced to metabase

* update metabase database names again

* warehouse(payments_views_staging): split yml files into staging and source, added documentation for cleaned files, deleted old validation tables

* warehouse(payments_views_staging): added generic tests, added composite unique tests from dbt_packages, added docs file with references, materialized staging tables as views

* warehouse(payments_views_staging): added configuration to persist singular tests as tables in the warehouse

* warehouse(payments_views): migrated airflow dags for payments views to its own model in dbt, added metadata and generic tests, added dbt references

* print message if deploy is not set

* round lat/lons, specify 4m accuracy, add new resources

* print the documentation file being written

* add coord system, disable shapes for now due to size limit

* fix(fact daily trips timeout): wip incremental table

* update to good stable version of sqlfluff

* fix: make fact daily trips incremental -- WIP

* pass and/or ignore new rules

* linter

* fact daily trips: remove dev incremental check

* docs: update airtable prod maintenance instructions

* docs: add new dags to dependency diagram

* docs: add spacing to help w line wrapping

* docs: more spaces for line wrapping...

* dbt-metabase: update version in poetry; comment out failing relationship tests

* warehouse(payments_views): got payments_rides working and migrated, added yml and metadata,  added payments_views validation tests and persisted tables, added payments_views_refactored with intermedite tables and got that to work

* get new calitp version

* import gcs models from calitp-py!

* missed a couple

* get us going!

* fix: make airtable gcs operator use timestamps rather than time string

* fix(timestamp partitions): update calitp version to get schedule partition updates

* fix(timestamp partitions): explicitly use isoformat string

* use new calitp version

* start experimenting with task queue options and metrics

* get this working and test performance with greenlets

* couple more metrics

* wip testing with multiple consumers at high volume

* start optimizing for lots of small tasks; have to make redis interaction fast

* fix key str format

* couple more libs

* wip

* wip on discussed changes

* get the keys from environ for now

* use new calitp py

* print a bit more

* we are just gonna get stuff in the env

* commit this before I break anything

* fmt

* bump calitp-py

* lint

* rename v2 to v3 since 2.X tags already exist

* kinda make this runnable

* new node pool just dropped

* get running in docker compose to kick the tires

* start on RT v3 k8s

* get the consumer working mostly?

* label redis pod appropriately

* tell consumer about temp rt secrets

* that was dumb

* ticker k8s!

* set expire time on the huey instance

* point consumer at svc account json

* avoid pulling the stacktrace in

* scrape on 9102

* bump to 16 workers per consumer

* bump jupyterhub storage to 32gi

* add these back!

* add comment

* bring in new calitp and fix tick rounding

* improve metrics and labels

* warehouse(payments): removed payemnts_rides_refactor from yml file

* clean up labels

* get secrets from secret manager sdk before the consumer starts...

* missed this

* fix secrets volume and adjust affinities

* warehouse(payments): removed the airflow dags for the payments_views that were migrated, as well as the two test tables

* warehouse(payments): removed the old intermediate tables from the dbt project yaml file

* add content type header to bytes

* ugh whitespace

* warehouse: fixing linting error

* warehouse: fixing linting error again

* warehouse(dbt_project): added to-do comments in project config to remind where to move model schemas in the future

* fix: update Mountain Transit URL

* remove celery and gevent from pyproject deps

Co-authored-by: Mjumbe Poe <[email protected]>

* we might as well specify huey app name by env as well just in case we end up on the same redis in the future

* write to the prod bucket!

* create a preprod version and deploy it

* run fewer workers in preprod

* move pull policies to patches, and only run 1 dev consumer

* add redis considerations to readme

* docs(datasets and tables): revised informationon dbt docs for views tables based on PR review

* docs(datasets and tables): revised for readability

* docs(datasets and tables): revised docs information for gtfs schedule based on PR review

* docs(datasets and tables): fixed readability

* docs(datasets and tables): added new formatting, added gtfs rt dbt docs instructions

* docs(datasets and tables): revamped the overview page for datasets and tables

* docs(datasets and tables): cleaned up readability

* bump version and start adding more logging context

* specifically log request errors that do not come from raise_for_status

* set v3 image versions separately

* bump to 8 workers and improve log formatting

* formatting

* fix string representation of exception type in logs

* bump prod to 3.1

* oops

* hotfix version

* bump to 30m

* warehouse(airflow): deleted the empty payments_views_staging dag directory

* warehouse(airflow): deleted dummy_staging airflow task, removed gusty dependencies from other tables that relied on that task

* docs(airflow): edited the production dags docs to reflect changes in payments staging views dags

* docs(airflow): revised docs based on lauries comment re only listing enfoorced dependencies

* Update new-team-member.md

Fixed added missing meetings, deleted old meetings. deleted auto-assign

* docs(datasets ans tables): reconfigured some pages for readability

* docs(datasets and tables): re-reviewed and added clarity

* fix (open data): align column publish metadata with open data dictionary -- suppress calitp hash, synthetic keys, and extraction date, add calitp_itp_id and url_number

* docs(production maintenance): added n/a for dependencies for payments_views

* docs(datasets and tables): created new page with content on how to use dbt docs, added to toc

* docs(datasets and tables): removed information on how to navigate dbt docs in favor of the new page created, added info to warehouse schema sections, created dbt project cirectory sections

* (analyst_docs): update gcloud commands

* fix(open data): make test_metadata attribute optional to account for singular tests

* docs(datasets and tables): reformatted for readability and conciseness

* docs(datasets and tables): revisions based on Laurie's review

* docs(datasets and tables): revised PR to put gtfs views tables used by ckan under the views doc

* fix(open data): suppress publishing stop_times because of size limit issue

* agencies.yml: update FCRTA and add Escalon Transit

* agencies.yml: rename escalon transit to etrans

* fix(airflow/gtfs_loader): replace non-utf-8 characters

* feat(airtable): add new columns per request #1674

* fix(airtable data): address review comments PR #1677

* fix: add WeHo RT URLs

* fix(ckan publishing): only add columns to data dictionary if they don't have publish.ignore set

* update calitp py and change log

* make docker compose work

* specify buckets and bump version in dev

* now do prod

* change logging

* add weho key

* bump gtfs rt v3 version

* bump calitp py

* deploy new image to dev

* get dev and prod working with bucket env vars

* bump calitp py and expire cache every 5 minutes

* deploy new cache clearing to prod/dev

* make sure calitp is updated, load secrets in ticker too

* fix docker compose, use new flags, deploy new image to dev

* bump prod

* add airtable age metric, bump version, scrape ticker

* delete experimental fact_daily_trips_inc incremental table that was not functioning correctly (#1681)

* docs: correct Transit Technology Stacks title (#1565)

The Transit Technology Stacks header was not properly being linked to in the overview table. This fixes that.

* fix: update GRaaS URLs (#1690)

* New schedule pipeline validation job (#1648)

* wip on validation in new schedule pipeline

* bring in stuff from calitp storage, work on saving validations/outcomes

* wip getting this working

* use new calitp prerelease, fix filenames/content, remove break

* oops

* working!

* update lockfile

* unzip/validate schedule dag

* remove this

* bring in latest calitp-py

* extra print

* pass env vars into pod

* fix lint

* add readme

* bring in latest calitp

* fix print and formatting

* bring the outcome-only classes over, and use env var for bucket

* filter out nones for RT airtable records

* bring in latest calitp py

* get latest calitp

* use new env var and rename validation job results

* start updating airflow with new calitp py and using bucket env vars

* test schedule downloader with new calitp

* new calitp

* handle new calitp, better logging

* add env vars for new calitp

* put prefix_bucket back for parse_and_validate_rt and document env var configuration

* comments

* use new version of caltip py with good gcsfs (#1693)

* use new version of caltip py with good gcsfs

* use the regular release

* docs(agency): adding reference table for analysts to define agency, reference for pre-commit hooks (#1430)

* docs(agency): adding reference table for analysts to define agency in their research

* docs(agency): fixed table formatting error

* docs(agency): fixed table formatting error plus pre-commit hooks

* docs(pre-commit hooks): added information for using and troubleshooting pre-commit hooks

* docs: formatting errors, added missing capitalization

* docs: formatting table with list

* docs: formatting table with no line break - attempt 1

* docs: clarified language and spacing in table

* docs: clarified language in table

* docs: removing extra information from agency table

* docs: removing extra information from agency table pt 2

* docs: removing extra information from agency table pt 3

* docs: reworked table to include gtfs-provider-service relationships

* docs: added space for the gtfs provider's services section

* docs: added space for the gtfs provider's services section syntax corrections

* docs: added space for the gtfs provider's services section syntax corrections again

* docs: clarified information arounf gtfs provider relationships

* docs: clarified information around gtfs provider relationships and intro content

* docs: agency table revisions based on call with E

* docs(agency reference): incorporated E's feedback in the copy, added warehouse table instead of airtable table

* docs(agency reference): reformatted table

* docs(warehouse): added new table information for analyst agency reference now that the airtable migration is complete and the table was created. added css styling to prevent table scrolling

* docs: renamed python library file h1 to be more intuitive

* docs(conf): added comments explaining the added css preventing horizontal scroll in markdown tables

* docs(add to what_is_agency)

* docs(warehouse): fixed some typos, errors, and formatting issues

Co-authored-by: natam1 <[email protected]>
Co-authored-by: Charles Costanzo <[email protected]>

* we also have to pin a specific fsspec version directly in the requirements (#1694)

* Create SFTP ingest component for Elavon data (#1692)

* kubernetes: sftp-ingest-elavon: add server component

* kubernetes: sftp-server: add sshd configuration

This enables functionality like chroot'd logins and disabling of shell
logins.

* kubernetes: sftp-server: add readinessProbe

Since the container is essentially built at startup, there is a sizeable
time delta between container startup and ssh server startup. This
addition helps the operator easily detect when installation is complete
and the service is running.

* kubernetes: sftp-server: add cluster service

This enables cluster workloads to login using a DNS names.

* kubernetes: sftp-server: refactor bootstrap script for better DRY

* kubernetes: prod-sftp-ingest-elavon: create production localization

* kubernetes: prod-sftp-ingest-elavon: add internet-service.yaml

This exposes the SFTP port for inbound connections from the vendor.

* ci: prod-sftp-ingest-elavon.env: enable prod deployment

* Fix typo in `what is agency` (#1698)

it's --> it's

* limit schedule validation jobs with a pool (#1700)

* Created new row-level access policy macro and applied it to payments_rides (#1697)

* created new row-level access policy and applied it to payments rides with newly generated service accounts

* ran pre-commit hooks to fix failing actions

Co-authored-by: Charles Costanzo <[email protected]>

* deploy voila fix (#1702)

* disable autodetect if schema is specified (#1704)

* Create v2 RT parsing and validation jobs in Airflow and creates external tables (#1691)

* start on new parsing job

* comment and fmt

* wip getting parsing working

* fmt

* get parsing working!

* save outcomes file properly

* remove old validator and dupe log

* this is only jsonl right now, so this workaround is bad

* wip on validation

* wip

* get parsing working, start simplifying

* get validation working with schedules referenced by airtable!

* missed this

* get the actual rt v2 airflow jobs mostly working

* missed this

* run v2 RT jobs at :15 instead of :30

* convert metadata field names to bq-safe

* fix being able to template bucket and test out rt_service_alerts_v2 external table

* add outcomes external table to test

* wip trying to get a debugger to test pydantic custom serialization

* fix rt outcome serialization to be bq safe

* create rest of rt v2 external tables

* couple small fixes

* start addressing PR comments

* address PR comment

* add ci/cd action to build gtfs-rt-parser-v2 image

* Fix: skip amplitude_benefits DAG if 404 (#1705)

* fix(amplitude): mark skip when 404 is encountered

* chore(amplitude): add some logging statements around API call

* Gtfs schedule unzip v2 (#1696)

* gtfs loader v2 wip

* gtfs unzipper v2: semi-working WIP -- can unzip at least one zipfile

* address initial review comments

* bump calitp version

* gtfs unzipper v2: working version with required functionality

* update calitp and make the downloader run with it

* gtfs unzipper v2: get working in airflow; use logging

* rename to distinguish zipfile from extracted files within zipfile

* resolve reviewer comments

* gtfs unzipper v2: refactor to raise exceptions on unparseable zips

* gtfs unzipper: further simplify exception handling

* final tweaks -- refactor of checking for invalid zip structure, tighten up processing of valid files

* comment typos/clarifications

Co-authored-by: Andrew Vaccaro <[email protected]>

* warehouse: added fare_systems transit database mart table (#1701)

* warehouse: added fare_systems transit database mart table

* warehouse: fixed duplicate doc macro issue for fare_systems

* explicitly declared schema

* removed columns no longer relevant

* warehouse: added bridge table for fare_systems x services

* warehouse: added bridge table for fare_systems x services to yaml

* Clean up RT outcomes (#1709)

* remove unnecessary json_encoders

* just save the extract path

* add a dockerignore

* grant access to payments_rides for non-agency users (#1714)

* grant access to payments_rides for non-agency users

* just use calitp domain and add a couple other users

* Run CKAN weekly, with multipart uploads as needed (#1710)

* wip getting multipart upload to ckan working

* remove before I forget

* mirror the example script... we get 500s with too many chunks

* commit this while it is working

* add this back

* allow env var to control target and bucket

* create weekly task to run publish california_open_data

* allow manifest to be in gcs

* get this actually working...

* dockerignore

* clean up names, add resource requests, make work in pod operator

* address PR comments

* load this from a secret (#1717)

* Initial dbt models to support GTFS guidelines checks (#1712)

* initial work towards #1688

* gtfs guidelines initial implementation: tweaks & improvements

* gtfs guidelines: add metabase semantic type for calitp agency name

* sync new dataset to metabase

* gtfs guidelines: rename table, formatting updates

* rename compliance gtfs feature per PR review

* Add RT VP vs Sched Table (#1708)

* add table

* add table

* add operator

* fix sql syntax

* fix failing indentations

* add unique test

* fix .yml test

* Create local Dockerfile and bash script for dbt development (#1711)

* start on local dev dockerfile

* handle local profiles dir

* make dbt docker work with local google credentials

* add build-essentials per recommendation

* update poetry install method and add libgdal-dev

* poetry changed its bin location

* Improvements to dbt artifacts and publish workflow (#1726)

* add ts partition to publish artifacts

* also save artifacts with timestamps vs just latest

* start simplifying publish script, proper dry runs, reading manifest from gcs

* fix publish assert, use env vars, simplify logging

* allow resource descriptions in publishing, allow direct remote writing

* ugh

* need to be utc

* bring in simplified descriptions

* missed bucket

* upload metadata/dictionary to gcs for ckan; also fix bug

* update ckan docs to reflect publishing changes

* actually these should always get written

* env vars not templating

* fix timestamped artifact names

* pretty print

* address pr comments

* update ckan publishing docs

* actually set ckan precision fields and use them

* uppercase field types and allow specifying a model to publish

* bad dict key

* these are length 7

* lats are only 6 digits

* warehouse documentation: add calitp_itp_id and calitp_url_number metadata to several dimensional columns (#1733)

* airtable organizations: define external table schemas (#1734)

* Upgrade schedule validator and save version as metadata (#1729)

* update to v3 validator, fix dockerfile

* finally deploy the schedule validator image through github actions

* bring in latest calitp

* use new calitp, simplify metadata, add version to notice rows, couple qol improvements

* change flag per v3

* use poetry export install here too

* lock

* export install here too

* add verbose, just copy jar instead of download

* use environ directly

* Set RT validator version as metadata and fix a bug (#1732)

* set rt validator version as metadata

* add validator version in metadata and put extract under a key

* fix schedule data exception string representation and assert after outcomes upload

* fix poetry in docker, lock

* use export and install

* update typer

* fix schedule downloading... also add url filter to cli

* get latest validator from github just in case, and keep name

* rename this here too

* address PR comments

* add pool for airtable (#1743)

* deprecate airtable v1 extracts (#1699)

* deprecate airtable v1 extracts

* delete v1 airtable operator

* Change column name to fix run error (#1730)

* change date col name

* fix service_date col

* chore: remove evansiroky from most CODEOWNERS items (#1735)

* kubernetes: prod-sftp-ingest-elavon: add elavon ssh public key (#1742)

Co-authored-by: Laurie Merrell <[email protected]>
Co-authored-by: Andrew Vaccaro <[email protected]>
Co-authored-by: Andrew Vaccaro <[email protected]>
Co-authored-by: Charlie Costanzo <[email protected]>
Co-authored-by: Kegan Maher <[email protected]>
Co-authored-by: Laurie <[email protected]>
Co-authored-by: evansiroky <[email protected]>
Co-authored-by: Mjumbe Poe <[email protected]>
Co-authored-by: tiffanychu90 <[email protected]>
Co-authored-by: tiffanychu90 <[email protected]>
Co-authored-by: natam1 <[email protected]>
Co-authored-by: Charles Costanzo <[email protected]>
Co-authored-by: Angela Tran <[email protected]>
Co-authored-by: natam1 <[email protected]>
Co-authored-by: Github Action build-release-candidate <runner@fv-az173-876>
  • Loading branch information
16 people authored Sep 6, 2022
1 parent df492ca commit b1ffb49
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion kubernetes/apps/charts/jupyterhub/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ jupyterhub:
defaultUrl: "/lab"
image:
name: ghcr.io/cal-itp/calitp-py
tag: hub-v14
tag: hub-v15
memory:
# Much more than 10 and we risk bumping up against the actual capacity of e2-highmem-2
limit: 10G
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ kind: ConfigMap
metadata:
name: sftp-user-config
data:
authorized_keys: ''
authorized_keys: 'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDDo2kni8Bu16miTauaZfvZlIDj/90t9XIr7PP03SjTQb6bzQpioOBGENKcO2eOCbyYxFfTP1jwIUFElHgsY5OQy7LUbywbzIiZRCE0kU5B7O8uNwUY7kl7nZYHYzFccDug+czfkoUBZEHVj1pnVejgHjKEomp8XFRnaeBmpQm46A0IptM+AT0u3mNkJ7kt5RRC0BwKCD2a3Nn61gD37HEjqMK8seqw/c5i1UZ2EdDEFQXoiMH2P95JxyshRv0mpa8vVBdEjmOlDQXfNarWhDcll2an3h3dm0sAtbiTPdktRl2DC1pZeiWAiitqJ6f0g+YFfC5AwX+/4m/anlK8JnH7FTuiI1dSHukf98OutWMsBWl0huuC/bO9qfQTJkqcHsmCibkRujuHCP6FXNPmHwN1FFK3AYADeEiQ5nq4QRGtN1zOLX2jz21ylpgtK8V8LOxpu/r38OqkuzEh48n3v6YrqGY58w0P+z3ywQWAzNDLr0c05Q1kU9m4YOg8NkgAU/vilUXDNfjgBWsYJHyTQQQjavj7NGfuoFgItXTki5y+ccPFiU99YU+gbL6iqJvC8qhqY8H1fafWM0tx9i4TvPirrNxXty8mS1zw9eERtDb17SkNS794ydtZ3Ohui5L78Uo4Z/WTRKHmupuBP3oFLT+tZhYBwKgnG+Y/tFrz5ov6+w== USBank' # elavon

0 comments on commit b1ffb49

Please sign in to comment.