-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT: S3 remote file paths in one location #423
Conversation
tm_stop_crossing = LocalS3Location( | ||
bucket_name=springboard_bucket, | ||
file_prefix=os.path.join(tm_prefix, "STOP_CROSSING"), | ||
) | ||
tm_geo_node_file = LocalS3Location( | ||
bucket_name=springboard_bucket, | ||
file_prefix=os.path.join(tm_prefix, "TMMAIN_GEO_NODE.parquet"), | ||
) | ||
tm_route_file = LocalS3Location( | ||
bucket_name=springboard_bucket, | ||
file_prefix=os.path.join(tm_prefix, "TMMAIN_ROUTE.parquet"), | ||
) | ||
tm_trip_file = LocalS3Location( | ||
bucket_name=springboard_bucket, | ||
file_prefix=os.path.join(tm_prefix, "TMMAIN_TRIP.parquet"), | ||
) | ||
tm_vehicle_file = LocalS3Location( | ||
bucket_name=springboard_bucket, | ||
file_prefix=os.path.join(tm_prefix, "TMMAIN_VEHICLE.parquet"), | ||
) | ||
|
||
# output of public bus events published by LAMP | ||
bus_events = LocalS3Location( | ||
bucket_name=public_bucket, | ||
file_prefix="bus_vehicle_events", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't seem like this stuff was being used
rt_vehicle_positions = S3Location( | ||
bucket=S3_SPRINGBOARD, | ||
prefix=os.path.join(LAMP, "RT_VEHICLE_POSITIONS"), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pulled all of these declarations out of the class to reduce the verbosity of referencing them in other parts of the application.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense. it was too long and too gross already.
) | ||
|
||
|
||
class GTFSArchive(S3Location): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sub class with additional function for compressed gtfs archive files
9c8aa4a
to
1f52ba3
Compare
1f52ba3
to
c60cb65
Compare
Coverage of commit
|
@@ -54,6 +54,7 @@ pylint = "^3.2.2" | |||
pytest = "^8.2.1" | |||
pytest-cov = "^5.0.0" | |||
types-python-dateutil = "^2.9.0.20240316" | |||
pytest-env = "^1.1.3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use this package for env vars during testing to avoid any ENV VAR mocking requirements.
@pytest.fixture(name="_set_env_vars") | ||
def fixture_set_env_vars() -> None: | ||
"""setup bucket names for this test""" | ||
os.environ["SPRINGBOARD_BUCKET"] = "springboard" | ||
os.environ["ERROR_BUCKET"] = "error" | ||
os.environ["ARCHIVE_BUCKET"] = "archive" | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't need this with pytest-env
package
Coverage of commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have a few comments, mostly around if we should be making more constants for remote locations in remote_files.py
or generating those paths in the files using them based on the constants we have.
my intuition says everything should be defined there, but i would concede its unlikely we'd need some of these paths outside of the context they're defined in (like hyper file publication).
@@ -77,6 +78,13 @@ log_cli = true | |||
log_cli_level = "DEBUG" | |||
verbose = true | |||
|
|||
[tool.pytest.ini_options] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thats a cool addition.
S3_SPRINGBOARD: str = os.environ.get("SPRINGBOARD_BUCKET", "") | ||
S3_PUBLIC: str = os.environ.get("PUBLIC_ARCHIVE_BUCKET", "") | ||
S3_INCOMING: str = os.environ.get("INCOMING_BUCKET", "") | ||
S3_ARCHIVE: str = os.environ.get("ARCHIVE_BUCKET", "") | ||
S3_ERROR: str = os.environ.get("ERROR_BUCKET", "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would suggest changing the defaults from "" to "<bucket_name>_unset" to help with debugging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is the issue in publishing/performancedata.py
around this though where we string match the bucket, but we could handle that differently with a block in the run_on_app_start
similarly to how we run the alembic_upgrade_to_head
using an env var directly.
gtfs_tmp_folder = GTFS_PATH.replace( | ||
os.getenv("PUBLIC_ARCHIVE_BUCKET"), "/tmp" | ||
) | ||
gtfs_tmp_folder = compressed_gtfs.s3_uri.replace(S3_PUBLIC, "/tmp") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it make sense to build in the /tmp
local file locations into the S3Location
class as well, since it comes up as a pattern somewhat often and the location info is always the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at it more that looks like a hairy mess thats outside the scope here.
from lamp_py.runtime_utils.remote_files import ( | ||
S3_PUBLIC, | ||
LAMP, | ||
) | ||
|
||
from .gtfs_utils import BOSTON_TZ | ||
|
||
|
||
class AlertsS3Info: | ||
"""S3 Constant info for Alerts Parquet File""" | ||
|
||
bucket_name: str = os.environ.get("PUBLIC_ARCHIVE_BUCKET", "") | ||
s3_path: str = "s3://" + os.path.join( | ||
bucket_name, "lamp", "tableau", "alerts", "LAMP_RT_ALERTS.parquet" | ||
S3_PUBLIC, LAMP, "tableau", "alerts", "LAMP_RT_ALERTS.parquet" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this should be another instance of the S3Location
class rather than building up the path in this file.
rt_vehicle_positions = S3Location( | ||
bucket=S3_SPRINGBOARD, | ||
prefix=os.path.join(LAMP, "RT_VEHICLE_POSITIONS"), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense. it was too long and too gross already.
src/lamp_py/tableau/jobs/rt_rail.py
Outdated
remote_parquet_path=f"s3://{os.getenv('PUBLIC_ARCHIVE_BUCKET')}/lamp/tableau/rail/LAMP_ALL_RT_fields.parquet", | ||
remote_parquet_path=f"s3://{S3_PUBLIC}/{LAMP}/tableau/rail/LAMP_ALL_RT_fields.parquet", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again it feels like something that could be a constant in remote_files.py
rather than building it up here.
dcce7c1
to
6e32feb
Compare
Coverage of commit
|
LGTM. Merge away. |
PR #423 re-factor broke compressed GTFS process by prefixing the gtfs_temp_folder declaration with s3://
This changes attempts to consolidate references to S3 paths all in one location.
Original version of
remote_files.py
was modified to reduce the verbosity of calling resources defined by the file. Also added all bucket references and lamp file prefix constants to the file.Hopefully I found all the places where we reference S3 file locations....
Asana Task: https://app.asana.com/0/1205827492903547/1207771349226027