From 6acf813183628c8fbe0caf155a50f61081543e6c Mon Sep 17 00:00:00 2001 From: AshishKuls Date: Thu, 31 Aug 2023 22:09:20 +0000 Subject: [PATCH] deploy: 7be0b760714e84ec8a3bdfb89ab64f046c2a215a --- assessment.html | 28 +----- assessment.md | 19 +--- development.html | 188 +++++++++++++++++++++++++++++++-------- development.md | 76 ++++++++-------- index.html | 24 +++++ index.md | 13 ++- search/search_index.json | 2 +- sitemap.xml.gz | Bin 258 -> 258 bytes 8 files changed, 231 insertions(+), 119 deletions(-) diff --git a/assessment.html b/assessment.html index 483df19..ae21880 100644 --- a/assessment.html +++ b/assessment.html @@ -626,13 +626,6 @@ - - -
  • - - Use Cases and Key Limitations - -
  • @@ -898,13 +891,6 @@ - - -
  • - - Use Cases and Key Limitations - -
  • @@ -928,10 +914,10 @@

    Assessment

    RSM Configuration

    -

    Different scenario runs with varying configurations were done during the RSM development to then select a final set of configuration parameters to move forward with the overall assessment of RSM.

    -

    TODO: Include table with different configurations and corresponding run time.

    +

    The team conducted tests using different combinations for the RSM parameters, including the number of RSM zones (1000, 2000), default sampling rates (15%, 25%, 100%), enabling or disabling the intelligent sampler, and choosing the number of global iterations (2 or 3), among other factors. The most significant influence of the number of RSM zones was observed on the runtime of the highway assignment process. Since the highway assignment runtime was already low with 1000 RSM zones, there was no motivation to explore lower RSM zone number. Altering the sampling rate had a greater impact on the runtime of the demand model (CT-RAMP) compared to changing the number of RSM zones. These test runs exhibited varying runtimes depending on the specific configuration. Key metrics at the regional level were analyzed across these different test runs to comprehend the trade-off between improved runtime for RSM and achieving RSM results that are similar to ABM. Based on this, the team collectively determined that for the MVP (Minimum Viable Product) version of the RSM, the “optimal” configuration would be to use 2000 RSM zones, a 25% default sampling rate, the intelligent sampler turned off, and 2 global iterations and this RSM configuration was used to move forward with the overall assessment of the RSM.

    Calibration

    Aggregating the ABM zones to RSM zones, distorts the walk trips share coming out of the model. With the model configuration (Rapid Zones, Global Iterations, Sample Rate, etc.) for RSM as identified above, tour mode choice calibration was performed to match the RSM mode share to ABM2+ mode share, primarily to match the walk trips. A calibration constant was applied to the tour mode choice UEC to School, Maintenance, Discretionary tour purpose. The mode share for Work and University purpsoe were reasonable, therefore the calibration wasn’t applied to those purposes.

    +

    RSM specific constants were added to the Tour Mode Choice UEC (TourModeChoice.xls) to some of the tour purposes. The Walk mode share for the Maintenance and Discretionary purposes was first adjusted by calibrating and applying n RSM specific constant row to the UEC. Furthermore, in cases where the tour involved escorting for Maintenance or Discretionary purposes, an additional calibration constant was introduced to further adjust the walk mode share for such escort tours. Similarly, a differeent set of constants were added to calibrate the School tour purpose. There was no need to calibrate mode choice for any other tour purpose as those were reasonable from RSM.

    Note that a minor calibration will be required for RSM when number of rapid zones are changed.

    Here is how the mode share and VMT compares before and after the calibration for RSM. Donor model in the charts below refers to the ABM2+ run.

    @@ -977,16 +963,6 @@

    Local Transit ChangesRapid 637 BRT

    TODO: Add some text to explain how this test was performed using the study area parameter TODO: Add outcome screenshot

    -

    Use Cases and Key Limitations

    -

    Based on set of tests done as part of this project, RSM performs well for regional scale roadway projects (e.g., auto operating costs and mileage fee, TNC costs and wait times etc.) and regional scale transit projects (transit fare, headway changes etc.). RSM also performed well for land-use change policies. Lastly, RSM was also tested for local roadway changes (e.g., managed lanes conversion) and local transit changes (e.g., new BRT line), and the results indicate that those policies are reasonably represented by RSM as well.

    -

    Here are some of the current limitations of RSM:

    - diff --git a/assessment.md b/assessment.md index 58a8c32..8fb402d 100644 --- a/assessment.md +++ b/assessment.md @@ -1,12 +1,11 @@ ## RSM Configuration - -Different scenario runs with varying configurations were done during the RSM development to then select a final set of configuration parameters to move forward with the overall assessment of RSM. - -TODO: Include table with different configurations and corresponding run time. +The team conducted tests using different combinations for the RSM parameters, including the number of RSM zones (1000, 2000), default sampling rates (15%, 25%, 100%), enabling or disabling the intelligent sampler, and choosing the number of global iterations (2 or 3), among other factors. The most significant influence of the number of RSM zones was observed on the runtime of the highway assignment process. Since the highway assignment runtime was already low with 1000 RSM zones, there was no motivation to explore lower RSM zone number. Altering the sampling rate had a greater impact on the runtime of the demand model (CT-RAMP) compared to changing the number of RSM zones. These test runs exhibited varying runtimes depending on the specific configuration. Key metrics at the regional level were analyzed across these different test runs to comprehend the trade-off between improved runtime for RSM and achieving RSM results that are similar to ABM. Based on this, the team collectively determined that for the MVP (Minimum Viable Product) version of the RSM, the "optimal" configuration would be to use 2000 RSM zones, a 25% default sampling rate, the intelligent sampler turned off, and 2 global iterations and this RSM configuration was used to move forward with the overall assessment of the RSM. ## Calibration Aggregating the ABM zones to RSM zones, distorts the walk trips share coming out of the model. With the model configuration (Rapid Zones, Global Iterations, Sample Rate, etc.) for RSM as identified above, tour mode choice calibration was performed to match the RSM mode share to ABM2+ mode share, primarily to match the walk trips. A calibration constant was applied to the tour mode choice UEC to School, Maintenance, Discretionary tour purpose. The mode share for Work and University purpsoe were reasonable, therefore the calibration wasn't applied to those purposes. +RSM specific constants were added to the Tour Mode Choice UEC (TourModeChoice.xls) to some of the tour purposes. The Walk mode share for the `Maintenance` and `Discretionary` purposes was first adjusted by calibrating and applying n RSM specific constant row to the UEC. Furthermore, in cases where the tour involved escorting for Maintenance or Discretionary purposes, an additional calibration constant was introduced to further adjust the walk mode share for such escort tours. Similarly, a differeent set of constants were added to calibrate the `School` tour purpose. There was no need to calibrate mode choice for any other tour purpose as those were reasonable from RSM. + Note that a minor calibration will be required for RSM when number of rapid zones are changed. Here is how the mode share and VMT compares before and after the calibration for RSM. Donor model in the charts below refers to the ABM2+ run. @@ -105,15 +104,3 @@ TODO: Add outcome screenshot TODO: Add some text to explain how this test was performed using the study area parameter TODO: Add outcome screenshot - - -## Use Cases and Key Limitations -Based on set of tests done as part of this project, RSM performs well for regional scale roadway projects (e.g., auto operating costs and mileage fee, TNC costs and wait times etc.) and regional scale transit projects (transit fare, headway changes etc.). RSM also performed well for land-use change policies. Lastly, RSM was also tested for local roadway changes (e.g., managed lanes conversion) and local transit changes (e.g., new BRT line), and the results indicate that those policies are reasonably represented by RSM as well. - -Here are some of the current limitations of RSM: - -- The scope of the RSM is “passenger” travel. Policies and/or infrastructure that primarily impact commercial travel (e.g., truck lanes) will not be well represented. -- Minor re-calibration of the mode choice was necessary to match observed walk trips. Large changes to the number of zones will likely require recalibration. -- The spatial aggregation reduces the RSM’s ability to represent to simulate infrastructure and/or policies that act at small scales (e.g., pedestrian infrastructure). -- Policies related to the adoption of automated vehicles cannot be currently represented. RSM currently skips running the Household AV Allocation module. -- While the RSM has been tested, the testing has not been extensive. More extensive testing is likely to surface additional issues. Additional testing will be required to evaluate if RSM can be a viable tool for other policies that interests SANDAG. \ No newline at end of file diff --git a/development.html b/development.html index 160bcf4..309f148 100644 --- a/development.html +++ b/development.html @@ -732,60 +732,172 @@

    Zone AggregatorInput Aggregator

    The input aggregator module of RSM aggregates several input files, uec (soa) files, non-abm model outputs of the donor model based on the new RSM zones. The main inputs to this module include the location of the donor model, RSM socioeconomic file, TAZ and MGRA crosswalks. The module reads the original socioeconomic file and adds intersection count and several density variables that were originally generated by the 4D module of the current ABM2+ model. This is done here in RSM because the 4D module is skipped when running RSM. The module then uses the MGRA crosswalks between MGRA and RSM zones to aggregate the original socioeconomic file data based on the new RSM zones to create a new RSM specific socioeconomic file. Next, the module aggregates the following input files:

    - -

    Each of the above files has its own aggregation methodology. In some cases, the aggregation is based on mean, in some cases, it’s the total value, or in some cases, it’s the maximum values.

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    File NameAggregation ColumnsAggregation Methodology
    microMgraEquivMinutes.csvwalkTime, dist, mmTime, mmCost, mtTime,
     mtCost, mmGenTime, mtGenTime, minTime
    Mapped MGRA to RSM zones and aggregated the columns by taking mean.
    microMgraTapEquivMinutes.csvwalkTime, dist, mmTime, mmCost, mtTime,
           mtCost, mmGenTime, mtGenTime, minTime
    Mapped MGRA to RSM zones and aggregated the columns by taking mean.
    walkMgraTapEquivMinutes.csvboardingPerceived, boardingActual,alightingPerceived,
    alightingActual,boardingGain,alightingGain
    Mapped MGRA to RSM zones and aggregated the columns by taking mean.
    walkMgraEquivMinutes.csvpercieved,actual, gainMapped MGRA to RSM zones and aggregated the columns by taking mean.
    bikeTazLogsum.csvlogsum, timeMapped TAZ to RSM zones and aggregated the columns by taking the mean.
    bikeMgraLogsum.csvlogsum, timeMapped MGRA to RSM zones and aggregated the columns by taking the mean.
    zone.termterminal_timeMapped TAZ to RSM zones and took the maximum.
    zones.parkpark_zonesMapped TAZ to RSM zones and took the maximum.
    tap.ptypeMapping RSM zones to TAZs
    accessam.csvTIME, DISTANCE
    ParkLocationAlts.csvparkareaMapped MGRA to RSM zones and took the minimum.
    CrossBorderDestinationChoiceSoaAlternatives.csvMapping RSM zones to MGRA
    TourDcSoaDistanceAlts.csva, mgraIt is recreated with RSM zones
    DestinationChoiceAlternatives.csva, mgraIt is recreated with RSM zones
    SoaTazDistAlts.csva, destIt is recreated with RSM zones
    TripMatrices.csvCVM_<>:LT, CVM_<>:IT, CVM_<>:MT, CVM_<>:HT,
    CVM_<>:LNT, CVM_<>:INT, CVM_<>:MNT, CVM_<>:HNT
    where TIME PERIOD = EA, AM, MD, PM, EV
    Mapped TAZ to RSM zones and aggregated the columns by taking the sum.
    transponderModelAccessibilities.csvDIST,AVGTTS,PCTDETOURMapped TAZ to RSM zones and aggregated the columns by taking the mean.
    crossBorderTours.csvMapped MGRA/TAZs to RSM zones
    internalExternalTrips.csvMapped MGRA/TAZs to RSM zones
    visitorTours.csvMapped MGRA to RSM zones
    visitorTrips.csvMapped MGRA to RSM zones
    householdAVTrips.csvMapped MGRA to RSM zones
    airport_out.SAN.csvMapped MGRA/TAZ to RSM zones
    airport_out.CBX.csvMapped MGRA/TAZ to RSM zones
    TNCtrips.csvMapped MGRA/TAZ to RSM zones
    TRIP_<>_<>.CSV
    where SECTOR_TYPE = FA, GO, IN, RE, SV, TH, WH
    TIME_PERIOD = OE, AM, MD, PM, OL
    Mapped TAZ to RSM zones
    +

    More details on the the above files can be found here.

    Translate Demand

    The translate demand module of the RSM aggregates the non-resident demand matrices and trip tables based on the new RSM zone structure. The inputs of this module includes the path to the RSM model directory, donor model directory and crosswalks. In particular the module aggregates the demand from auto, transit, non-motorized, other trips from the airport, cross border, internal external and visitor model. It also aggregated TNC vehicle trips and empty AV trips.

    Intelligent Sampler

    -

    The intelligent sampler module is designed to intelligently sample households and persons from synthetic households and person data, considering accessibility metrics and other parameters. The main inputs to this module are the households file, person file, TAZ/MGRA crosswalks and the outputs are sampled households and person files. In the model properties file (sandag_abm.properties), the user can choose to run RSM sampler, specify the default sampling rate and minimum sampling rate for the RSM model run. The user also has the ability to sample specific zones at 100% by specifying them in the study area file.

    +

    The intelligent sampler module is designed to intelligently sample households and persons from synthetic households and person data, considering accessibility metrics and other parameters. The main inputs to this module are the households file, person file, TAZ/MGRA crosswalks and the outputs are sampled households and person files. In the model properties file (sandag_abm.properties), the user can choose to run RSM sampler, specify the default sampling rate, and minimum sampling rate for the RSM model run. The user also has the ability to sample specific zones at 100% by specifying them in the study area file and turn on the differential sampling indicator (use.differential.sampling equals to 1).

    The sampler function follows these primary steps:

    1. -

      Zone Mapping: The function maps zones from the synthetic households/person data to their corresponding RSM zones using crosswalk data.

      +

      Zone Mapping: The function maps zones from the synthetic households/person data to their corresponding RSM zones using crosswalk data.

    2. -

      Household Sampling:

      +

      Household Sampling:

      • If accessibility data is missing (first iteration) or if the RSM sampler is turned off, a default sampling rate is applied to all RSM zones, with optional 100% sampling in the study area.
      • -
      • If accessibility data is available and the RSM sampler is turned on, the function calculates differences in accessibility metrics between the current and previous iterations. The sampling rates are determined based on these differences and are adjusted to be within specified bounds. The RSM zones of the study area are sampled at a 100% sampling rate.
      • +
      • If accessibility data is available and the RSM sampler is turned on, the function calculates differences in accessibility metrics between the current and previous iterations. The sampling rates are determined based on these differences and are adjusted to be within specified bounds. The RSM zones of the study area are sampled at a 100% sampling rate if the differential sampling indicator is turned on.
    3. -

      Households and Persons Selection: The function selects households based on the calculated sampling rates. It also selects persons associated with the sampled households.

      +

      Households and Persons Selection: The function selects households based on the calculated sampling rates. It also selects persons associated with the sampled households.

    4. -

      Output:

      +

      Output:

      • The selected households and persons are written to output CSV files in the specified output directory.
      • The function also computes and logs the total sampling rate, representing the proportion of selected households relative to the total number of households.
      • @@ -807,7 +919,7 @@

        Intelligent Assembler + + Use Cases and Key Limitations + +

      @@ -557,6 +564,13 @@ Introduction +
    5. + +
    6. + + Use Cases and Key Limitations + +
    7. @@ -587,6 +601,16 @@

      Introduction

      The computational time of ABM2+, and the likely computational time of the successor to ABM2+ (ABM3), hinders SANDAG’s ability to carry out certain analyses in a timely manner. For example, if an analyst wants to explore 10 different roadway pricing schemes for a select corridor, a month of computation time would be required.

      SANDAG requires a tool capable of quickly approximating the outcomes of ABM2+. Therefore, a tool was built for this purpose, referred to henceforth as the Rapid Strategic Model (RSM). The primary objective of the RSM was to enhance the speed of the resident passenger component within the broader modeling system and produce results that closely aligned with ABM2+ for policy planning requirements.

      +

      Use Cases and Key Limitations

      +

      Based on set of tests done as part of this project, RSM performs well for regional scale roadway projects (e.g., auto operating costs and mileage fee, TNC costs and wait times etc.) and regional scale transit projects (transit fare, headway changes etc.). RSM also performed well for land-use change policies. Lastly, RSM was also tested for local roadway changes (e.g., managed lanes conversion) and local transit changes (e.g., new BRT line), and the results indicate that those policies are reasonably represented by RSM as well.

      +

      Here are some of the current limitations of RSM:

      +
        +
      • The scope of the RSM is “passenger” travel. Policies and/or infrastructure that primarily impact commercial travel (e.g., truck lanes) will not be well represented.
      • +
      • Minor re-calibration of the mode choice was necessary to match observed walk trips. Large changes to the number of zones will likely require recalibration.
      • +
      • The spatial aggregation reduces the RSM’s ability to represent to simulate infrastructure and/or policies that act at small scales (e.g., pedestrian infrastructure).
      • +
      • Policies related to the adoption of automated vehicles cannot be currently represented. RSM currently skips running the Household AV Allocation module.
      • +
      • While the RSM has been tested, the testing has not been extensive. More extensive testing is likely to surface additional issues. Additional testing will be required to evaluate if RSM can be a viable tool for other policies that interests SANDAG.
      • +
      diff --git a/index.md b/index.md index b1a31df..5d6ceb3 100644 --- a/index.md +++ b/index.md @@ -11,4 +11,15 @@ The travel demand model SANDAG used for the 2021 regional plan, referred to as A The computational time of ABM2+, and the likely computational time of the successor to ABM2+ (ABM3), hinders SANDAG's ability to carry out certain analyses in a timely manner. For example, if an analyst wants to explore 10 different roadway pricing schemes for a select corridor, a month of computation time would be required. -SANDAG requires a tool capable of quickly approximating the outcomes of ABM2+. Therefore, a tool was built for this purpose, referred to henceforth as the Rapid Strategic Model (RSM). The primary objective of the RSM was to enhance the speed of the resident passenger component within the broader modeling system and produce results that closely aligned with ABM2+ for policy planning requirements. \ No newline at end of file +SANDAG requires a tool capable of quickly approximating the outcomes of ABM2+. Therefore, a tool was built for this purpose, referred to henceforth as the Rapid Strategic Model (RSM). The primary objective of the RSM was to enhance the speed of the resident passenger component within the broader modeling system and produce results that closely aligned with ABM2+ for policy planning requirements. + +## Use Cases and Key Limitations +Based on set of tests done as part of this project, RSM performs well for regional scale roadway projects (e.g., auto operating costs and mileage fee, TNC costs and wait times etc.) and regional scale transit projects (transit fare, headway changes etc.). RSM also performed well for land-use change policies. Lastly, RSM was also tested for local roadway changes (e.g., managed lanes conversion) and local transit changes (e.g., new BRT line), and the results indicate that those policies are reasonably represented by RSM as well. + +Here are some of the current limitations of RSM: + +- The scope of the RSM is “passenger” travel. Policies and/or infrastructure that primarily impact commercial travel (e.g., truck lanes) will not be well represented. +- Minor re-calibration of the mode choice was necessary to match observed walk trips. Large changes to the number of zones will likely require recalibration. +- The spatial aggregation reduces the RSM’s ability to represent to simulate infrastructure and/or policies that act at small scales (e.g., pedestrian infrastructure). +- Policies related to the adoption of automated vehicles cannot be currently represented. RSM currently skips running the Household AV Allocation module. +- While the RSM has been tested, the testing has not been extensive. More extensive testing is likely to surface additional issues. Additional testing will be required to evaluate if RSM can be a viable tool for other policies that interests SANDAG. \ No newline at end of file diff --git a/search/search_index.json b/search/search_index.json index 50b0c6c..fc5e87c 100644 --- a/search/search_index.json +++ b/search/search_index.json @@ -1 +1 @@ -{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"index.html","title":"SANDAG Rapid Strategic Model","text":"

      Welcome to the SANDAG Rapid Strategic Model documentation site!

      "},{"location":"index.html#introduction","title":"Introduction","text":"

      The travel demand model SANDAG used for the 2021 regional plan, referred to as ABM2+, is one of the most sophisticated modeling tools used anywhere in the world. Its activity-based approach to representing travel is behaviorally rich; the representations of land development and transportation infrastructure are represented in high fidelity spatial detail. An operational shortcoming of ABM2+ is it requires significant computational resources to carry out a simulation. A typical forecast year simulation of ABM2+ takes over 40 hours to complete on a high end workstation (e.g., 48 physical computing cores and 256 gigabytes of RAM). The components of this runtime include:

      • Three iterations of the resident activity-based model, each about 6 hours
      • Four iterations of roadway and transit assignment, with each iteration taking about 90 minutes

      The computational time of ABM2+, and the likely computational time of the successor to ABM2+ (ABM3), hinders SANDAG\u2019s ability to carry out certain analyses in a timely manner. For example, if an analyst wants to explore 10 different roadway pricing schemes for a select corridor, a month of computation time would be required.

      SANDAG requires a tool capable of quickly approximating the outcomes of ABM2+. Therefore, a tool was built for this purpose, referred to henceforth as the Rapid Strategic Model (RSM). The primary objective of the RSM was to enhance the speed of the resident passenger component within the broader modeling system and produce results that closely aligned with ABM2+ for policy planning requirements.

      "},{"location":"api.html","title":"Application Programming Interface","text":""},{"location":"api.html#rsm.zone_agg.aggregate_zones","title":"aggregate_zones(mgra_gdf, method='kmeans', n_zones=2000, random_state=0, cluster_factors=None, cluster_factors_onehot=None, use_xy=True, explicit_agg=(), explicit_col='mgra', agg_instruction=None, start_cluster_ids=13)","text":"

      Aggregate zones.

      "},{"location":"api.html#rsm.zone_agg.aggregate_zones--parameters","title":"Parameters","text":"

      mgra_gdf : mgra_gdf (GeoDataFrame) Geometry and attibutes of MGRAs method : method (array) default {\u2018kmeans\u2019, \u2018agglom\u2019, \u2018agglom_adj\u2019} n_zones : n_zones (int) random_state : random_state (RandomState or int) cluster_factors : cluster_factors (dict) cluster_factors_onehot : cluster_factors_onehot (dict) use_xy : use_xy (bool or float) Use X and Y coordinates as a cluster factor, use a float to scale the x-y coordinates from the CRS if needed. explicit_agg : explicit_agg (list[int or list]) A list containing integers (individual MGRAs that should not be aggregated) or lists of integers (groups of MGRAs that should be aggregated exactly as given, with no less and no more) explicit_col : explicit_col (str) The name of the column containing the ID\u2019s from explicit_agg, usually \u2018mgra\u2019 or \u2018taz\u2019 agg_instruction : agg_instruction (dict) Dictionary passed to pandas agg that says how to aggregate data columns. start_cluster_ids : start_cluster_ids (int, default 13) Cluster id\u2019s start at this value. Can be 1, but typically SANDAG has the smallest id\u2019s reserved for external zones, so starting at a greater value is typical.

      "},{"location":"api.html#rsm.zone_agg.aggregate_zones--returns","title":"Returns","text":"

      GeoDataFrame

      Source code in rsm/zone_agg.py
      def aggregate_zones(\n    mgra_gdf,\n    method=\"kmeans\",\n    n_zones=2000,\n    random_state=0,\n    cluster_factors=None,\n    cluster_factors_onehot=None,\n    use_xy=True,\n    explicit_agg=(),\n    explicit_col=\"mgra\",\n    agg_instruction=None,\n    start_cluster_ids=13,\n):\n\"\"\"\n    Aggregate zones.\n\n    Parameters\n    ----------\n    mgra_gdf : mgra_gdf (GeoDataFrame)\n        Geometry and attibutes of MGRAs\n    method : method (array)\n        default {'kmeans', 'agglom', 'agglom_adj'}\n    n_zones : n_zones (int)\n    random_state : random_state (RandomState or int)\n    cluster_factors : cluster_factors (dict)\n    cluster_factors_onehot : cluster_factors_onehot (dict)\n    use_xy : use_xy (bool or float)\n        Use X and Y coordinates as a cluster factor, use a float to scale the\n        x-y coordinates from the CRS if needed.\n    explicit_agg : explicit_agg (list[int or list])\n        A list containing integers (individual MGRAs that should not be aggregated)\n        or lists of integers (groups of MGRAs that should be aggregated exactly as\n        given, with no less and no more)\n    explicit_col : explicit_col (str)\n        The name of the column containing the ID's from `explicit_agg`, usually\n        'mgra' or 'taz'\n    agg_instruction : agg_instruction (dict)\n        Dictionary passed to pandas `agg` that says how to aggregate data columns.\n    start_cluster_ids : start_cluster_ids (int, default 13)\n        Cluster id's start at this value.  Can be 1, but typically SANDAG has the\n        smallest id's reserved for external zones, so starting at a greater value\n        is typical.\n\n    Returns\n    -------\n    GeoDataFrame\n    \"\"\"\n\n    if cluster_factors is None:\n        cluster_factors = {}\n\n    n = start_cluster_ids\n    if explicit_agg:\n        explicit_agg_ids = {}\n        for i in explicit_agg:\n            if isinstance(i, Number):\n                explicit_agg_ids[i] = n\n            else:\n                for j in i:\n                    explicit_agg_ids[j] = n\n            n += 1\n        if explicit_col == mgra_gdf.index.name:\n            mgra_gdf = mgra_gdf.reset_index()\n            mgra_gdf.index = mgra_gdf[explicit_col]\n        in_explicit = mgra_gdf[explicit_col].isin(explicit_agg_ids)\n        mgra_gdf_algo = mgra_gdf.loc[~in_explicit].copy()\n        mgra_gdf_explicit = mgra_gdf.loc[in_explicit].copy()\n        mgra_gdf_explicit[\"cluster_id\"] = mgra_gdf_explicit[explicit_col].map(\n            explicit_agg_ids\n        )\n        n_zones_algorithm = n_zones - len(\n            mgra_gdf_explicit[\"cluster_id\"].value_counts()\n        )\n    else:\n        mgra_gdf_algo = mgra_gdf.copy()\n        mgra_gdf_explicit = None\n        n_zones_algorithm = n_zones\n\n    if use_xy:\n        geometry = mgra_gdf_algo.centroid\n        X = list(geometry.apply(lambda p: p.x))\n        Y = list(geometry.apply(lambda p: p.y))\n        factors = [np.asarray(X) * use_xy, np.asarray(Y) * use_xy]\n    else:\n        factors = []\n    for cf, cf_wgt in cluster_factors.items():\n        factors.append(cf_wgt * mgra_gdf_algo[cf].values.astype(np.float32))\n    if cluster_factors_onehot:\n        for cf, cf_wgt in cluster_factors_onehot.items():\n            factors.append(cf_wgt * OneHotEncoder().fit_transform(mgra_gdf_algo[[cf]]))\n        from scipy.sparse import hstack\n\n        factors2d = []\n        for j in factors:\n            if j.ndim < 2:\n                factors2d.append(np.expand_dims(j, -1))\n            else:\n                factors2d.append(j)\n        data = hstack(factors2d).toarray()\n    else:\n        data = np.array(factors).T\n\n    if method == \"kmeans\":\n        kmeans = KMeans(n_clusters=n_zones_algorithm, random_state=random_state)\n        kmeans.fit(data)\n        cluster_id = kmeans.labels_\n    elif method == \"agglom\":\n        agglom = AgglomerativeClustering(\n            n_clusters=n_zones_algorithm, affinity=\"euclidean\", linkage=\"ward\"\n        )\n        agglom.fit_predict(data)\n        cluster_id = agglom.labels_\n    elif method == \"agglom_adj\":\n        from libpysal.weights import Rook\n\n        w_rook = Rook.from_dataframe(mgra_gdf_algo)\n        adj_mat = nx.adjacency_matrix(w_rook.to_networkx())\n        agglom = AgglomerativeClustering(\n            n_clusters=n_zones_algorithm,\n            affinity=\"euclidean\",\n            linkage=\"ward\",\n            connectivity=adj_mat,\n        )\n        agglom.fit_predict(data)\n        cluster_id = agglom.labels_\n    else:\n        raise NotImplementedError(method)\n    mgra_gdf_algo[\"cluster_id\"] = cluster_id\n\n    if mgra_gdf_explicit is None or len(mgra_gdf_explicit) == 0:\n        combined = merge_zone_data(\n            mgra_gdf_algo,\n            agg_instruction,\n            cluster_id=\"cluster_id\",\n        )\n        combined[\"cluster_id\"] = list(range(n, n + n_zones_algorithm))\n    else:\n        pending = []\n        for df in [mgra_gdf_algo, mgra_gdf_explicit]:\n            logger.info(f\"... merging {len(df)}\")\n            pending.append(\n                merge_zone_data(\n                    df,\n                    agg_instruction,\n                    cluster_id=\"cluster_id\",\n                ).reset_index()\n            )\n\n        pending[0][\"cluster_id\"] = list(range(n, n + n_zones_algorithm))\n\n        pending[0] = pending[0][\n            [c for c in pending[1].columns if c in pending[0].columns]\n        ]\n        pending[1] = pending[1][\n            [c for c in pending[0].columns if c in pending[1].columns]\n        ]\n        combined = pd.concat(pending, ignore_index=False)\n    combined = combined.reset_index(drop=True)\n\n    return combined\n
      "},{"location":"api.html#rsm.input_agg.agg_input_files","title":"agg_input_files(model_dir='.', rsm_dir='.', taz_cwk_file='taz_crosswalk.csv', mgra_cwk_file='mgra_crosswalk.csv', agg_zones=2000, ext_zones=12, input_files=['microMgraEquivMinutes.csv', 'microMgraTapEquivMinutes.csv', 'walkMgraTapEquivMinutes.csv', 'walkMgraEquivMinutes.csv', 'bikeTazLogsum.csv', 'bikeMgraLogsum.csv', 'zone.term', 'zones.park', 'tap.ptype', 'accessam.csv', 'ParkLocationAlts.csv', 'CrossBorderDestinationChoiceSoaAlternatives.csv', 'TourDcSoaDistanceAlts.csv', 'DestinationChoiceAlternatives.csv', 'SoaTazDistAlts.csv', 'TripMatrices.csv', 'transponderModelAccessibilities.csv', 'crossBorderTours.csv', 'internalExternalTrips.csv', 'visitorTours.csv', 'visitorTrips.csv', 'householdAVTrips.csv', 'crossBorderTrips.csv', 'TNCTrips.csv', 'airport_out.SAN.csv', 'airport_out.CBX.csv', 'TNCtrips.csv'])","text":""},{"location":"api.html#rsm.input_agg.agg_input_files--parameters","title":"Parameters","text":"

      model_dir : model_dir (path_like) path to full model run, default \u201c.\u201d rsm_dir : rsm_dir (path_like) path to RSM, default \u201c.\u201d taz_cwk_file : taz_cwk_file (csv file) default taz_crosswalk.csv taz to aggregated zones file. Should be located in RSM input folder mgra_cwk_file : mgra_cwk_file (csv file) default mgra_crosswalk.csv mgra to aggregated zones file. Should be located in RSM input folder input_files : input_files (csv + other files) list of input files to be aggregated. Should include the following files \u201cmicroMgraEquivMinutes.csv\u201d, \u201cmicroMgraTapEquivMinutes.csv\u201d, \u201cwalkMgraTapEquivMinutes.csv\u201d, \u201cwalkMgraEquivMinutes.csv\u201d, \u201cbikeTazLogsum.csv\u201d, \u201cbikeMgraLogsum.csv\u201d, \u201czone.term\u201d, \u201czones.park\u201d, \u201ctap.ptype\u201d, \u201caccessam.csv\u201d, \u201cParkLocationAlts.csv\u201d, \u201cCrossBorderDestinationChoiceSoaAlternatives.csv\u201d, \u201cTourDcSoaDistanceAlts.csv\u201d, \u201cDestinationChoiceAlternatives.csv\u201d, \u201cSoaTazDistAlts.csv\u201d, \u201cTripMatrices.csv\u201d, \u201ctransponderModelAccessibilities.csv\u201d, \u201ccrossBorderTours.csv\u201d, \u201cinternalExternalTrips.csv\u201d, \u201cvisitorTours.csv\u201d, \u201cvisitorTrips.csv\u201d, \u201chouseholdAVTrips.csv\u201d, \u201ccrossBorderTrips.csv\u201d, \u201cTNCTrips.csv\u201d, \u201cairport_out.SAN.csv\u201d, \u201cairport_out.CBX.csv\u201d, \u201cTNCtrips.csv\u201d

      "},{"location":"api.html#rsm.input_agg.agg_input_files--returns","title":"Returns","text":"

      Aggregated files in the RSM input/output/uec directory

      Source code in rsm/input_agg.py
      def agg_input_files(\n    model_dir = \".\", \n    rsm_dir = \".\",\n    taz_cwk_file = \"taz_crosswalk.csv\",\n    mgra_cwk_file = \"mgra_crosswalk.csv\",\n    agg_zones=2000,\n    ext_zones=12,\n    input_files = [\"microMgraEquivMinutes.csv\", \"microMgraTapEquivMinutes.csv\", \n    \"walkMgraTapEquivMinutes.csv\", \"walkMgraEquivMinutes.csv\", \"bikeTazLogsum.csv\",\n    \"bikeMgraLogsum.csv\", \"zone.term\", \"zones.park\", \"tap.ptype\", \"accessam.csv\",\n    \"ParkLocationAlts.csv\", \"CrossBorderDestinationChoiceSoaAlternatives.csv\", \n    \"TourDcSoaDistanceAlts.csv\", \"DestinationChoiceAlternatives.csv\", \"SoaTazDistAlts.csv\",\n    \"TripMatrices.csv\", \"transponderModelAccessibilities.csv\", \"crossBorderTours.csv\", \n    \"internalExternalTrips.csv\", \"visitorTours.csv\", \"visitorTrips.csv\", \"householdAVTrips.csv\", \n    \"crossBorderTrips.csv\", \"TNCTrips.csv\", \"airport_out.SAN.csv\", \"airport_out.CBX.csv\", \n    \"TNCtrips.csv\"]\n    ):\n\n\"\"\"\n        Parameters\n        ----------\n        model_dir : model_dir (path_like)\n            path to full model run, default \".\"\n        rsm_dir : rsm_dir (path_like)\n            path to RSM, default \".\"\n        taz_cwk_file : taz_cwk_file (csv file)\n            default taz_crosswalk.csv\n            taz to aggregated zones file. Should be located in RSM input folder\n        mgra_cwk_file : mgra_cwk_file (csv file)\n            default mgra_crosswalk.csv\n            mgra to aggregated zones file. Should be located in RSM input folder\n        input_files : input_files (csv + other files)\n            list of input files to be aggregated. \n            Should include the following files\n                \"microMgraEquivMinutes.csv\", \"microMgraTapEquivMinutes.csv\", \n                \"walkMgraTapEquivMinutes.csv\", \"walkMgraEquivMinutes.csv\", \"bikeTazLogsum.csv\",\n                \"bikeMgraLogsum.csv\", \"zone.term\", \"zones.park\", \"tap.ptype\", \"accessam.csv\",\n                \"ParkLocationAlts.csv\", \"CrossBorderDestinationChoiceSoaAlternatives.csv\",\n                \"TourDcSoaDistanceAlts.csv\", \"DestinationChoiceAlternatives.csv\", \"SoaTazDistAlts.csv\",\n                \"TripMatrices.csv\", \"transponderModelAccessibilities.csv\", \"crossBorderTours.csv\",\n                \"internalExternalTrips.csv\", \"visitorTours.csv\", \"visitorTrips.csv\", \"householdAVTrips.csv\",\n                \"crossBorderTrips.csv\", \"TNCTrips.csv\", \"airport_out.SAN.csv\", \"airport_out.CBX.csv\",\n                \"TNCtrips.csv\"\n\n        Returns\n        -------\n        Aggregated files in the RSM input/output/uec directory\n    \"\"\"\n\n    df_clusters = pd.read_csv(os.path.join(rsm_dir, \"input\", taz_cwk_file))\n    df_clusters.columns= df_clusters.columns.str.strip().str.lower()\n    dict_clusters = dict(zip(df_clusters['taz'], df_clusters['cluster_id']))\n\n    mgra_cwk = pd.read_csv(os.path.join(rsm_dir, \"input\", mgra_cwk_file))\n    mgra_cwk.columns= mgra_cwk.columns.str.strip().str.lower()\n    mgra_cwk = dict(zip(mgra_cwk['mgra'], mgra_cwk['cluster_id']))\n\n    taz_zones = int(agg_zones) + int(ext_zones)\n    mgra_zones = int(agg_zones)\n\n    # aggregating microMgraEquivMinutes.csv\n    if \"microMgraEquivMinutes.csv\" in input_files:\n        logging.info(\"Aggregating - microMgraEquivMinutes.csv\")\n        df_mm_eqmin = pd.read_csv(os.path.join(model_dir, \"output\", \"microMgraEquivMinutes.csv\"))\n        df_mm_eqmin['i_new'] = df_mm_eqmin['i'].map(mgra_cwk)\n        df_mm_eqmin['j_new'] = df_mm_eqmin['j'].map(mgra_cwk)\n\n        df_mm_eqmin_agg = df_mm_eqmin.groupby(['i_new', 'j_new'])['walkTime', 'dist', 'mmTime', 'mmCost', 'mtTime', 'mtCost',\n       'mmGenTime', 'mtGenTime', 'minTime'].mean().reset_index()\n\n        df_mm_eqmin_agg = df_mm_eqmin_agg.rename(columns = {'i_new' : 'i', 'j_new' : 'j'})\n        df_mm_eqmin_agg.to_csv(os.path.join(rsm_dir, \"input\", \"microMgraEquivMinutes.csv\"), index = False)\n\n    else:\n        raise FileNotFoundError(\"microMgraEquivMinutes.csv\")\n\n\n    # aggregating microMgraTapEquivMinutes.csv\"   \n    if \"microMgraTapEquivMinutes.csv\" in input_files:\n        logging.info(\"Aggregating - microMgraTapEquivMinutes.csv\")\n        df_mm_tap = pd.read_csv(os.path.join(model_dir, \"output\", \"microMgraTapEquivMinutes.csv\"))\n        df_mm_tap['mgra'] = df_mm_tap['mgra'].map(mgra_cwk)\n\n        df_mm_tap_agg = df_mm_tap.groupby(['mgra', 'tap'])['walkTime', 'dist', 'mmTime', 'mmCost', 'mtTime',\n       'mtCost', 'mmGenTime', 'mtGenTime', 'minTime'].mean().reset_index()\n\n        df_mm_tap_agg.to_csv(os.path.join(rsm_dir, \"input\", \"microMgraTapEquivMinutes.csv\"), index = False)\n\n    else:\n        raise FileNotFoundError(\"microMgraTapEquivMinutes.csv\")\n\n    # aggregating walkMgraTapEquivMinutes.csv\n    if \"walkMgraTapEquivMinutes.csv\" in input_files:\n        logging.info(\"Aggregating - walkMgraTapEquivMinutes.csv\")\n        df_wlk_mgra_tap = pd.read_csv(os.path.join(model_dir, \"output\", \"walkMgraTapEquivMinutes.csv\"))\n        df_wlk_mgra_tap[\"mgra\"] = df_wlk_mgra_tap[\"mgra\"].map(mgra_cwk)\n\n        df_wlk_mgra_agg = df_wlk_mgra_tap.groupby([\"mgra\", \"tap\"])[\"boardingPerceived\", \"boardingActual\",\"alightingPerceived\",\"alightingActual\",\"boardingGain\",\"alightingGain\"].mean().reset_index()\n        df_wlk_mgra_agg.to_csv(os.path.join(rsm_dir, \"input\", \"walkMgraTapEquivMinutes.csv\"), index = False)\n\n    else:\n        FileNotFoundError(\"walkMgraTapEquivMinutes.csv\")\n\n    # aggregating walkMgraEquivMinutes.csv\n    if \"walkMgraEquivMinutes.csv\" in input_files:\n        logging.info(\"Aggregating - walkMgraEquivMinutes.csv\")\n        df_wlk_min = pd.read_csv(os.path.join(model_dir, \"output\", \"walkMgraEquivMinutes.csv\"))\n        df_wlk_min[\"i\"] = df_wlk_min[\"i\"].map(mgra_cwk)\n        df_wlk_min[\"j\"] = df_wlk_min[\"j\"].map(mgra_cwk)\n\n        df_wlk_min_agg = df_wlk_min.groupby([\"i\", \"j\"])[\"percieved\",\"actual\", \"gain\"].mean().reset_index()\n\n        df_wlk_min_agg.to_csv(os.path.join(rsm_dir, \"input\", \"walkMgraEquivMinutes.csv\"), index = False)\n\n    else:\n        FileNotFoundError(\"walkMgraEquivMinutes.csv\")\n\n    # aggregating biketazlogsum\n    if \"bikeTazLogsum.csv\" in input_files:\n        logging.info(\"Aggregating - bikeTazLogsum.csv\")\n        bike_taz = pd.read_csv(os.path.join(model_dir, \"output\", \"bikeTazLogsum.csv\"))\n\n        bike_taz[\"i\"] = bike_taz[\"i\"].map(dict_clusters)\n        bike_taz[\"j\"] = bike_taz[\"j\"].map(dict_clusters)\n\n        bike_taz_agg = bike_taz.groupby([\"i\", \"j\"])[\"logsum\", \"time\"].mean().reset_index()\n        bike_taz_agg.to_csv(os.path.join(rsm_dir, \"input\", \"bikeTazLogsum.csv\"), index = False)\n\n    else:\n        raise FileNotFoundError(\"bikeTazLogsum.csv\")\n\n    # aggregating bikeMgraLogsum.csv\n    if \"bikeMgraLogsum.csv\" in input_files:\n        logging.info(\"Aggregating - bikeMgraLogsum.csv\")\n        bike_mgra = pd.read_csv(os.path.join(model_dir, \"output\", \"bikeMgraLogsum.csv\"))\n        bike_mgra[\"i\"] = bike_mgra[\"i\"].map(mgra_cwk)\n        bike_mgra[\"j\"] = bike_mgra[\"j\"].map(mgra_cwk)\n\n        bike_mgra_agg = bike_mgra.groupby([\"i\", \"j\"])[\"logsum\", \"time\"].mean().reset_index()\n        bike_mgra_agg.to_csv(os.path.join(rsm_dir, \"input\", \"bikeMgraLogsum.csv\"), index = False)\n    else:\n        raise FileNotFoundError(\"bikeMgraLogsum.csv\")\n\n    # aggregating zone.term\n    if \"zone.term\" in input_files:\n        logging.info(\"Aggregating - zone.term\")\n        df_zone_term = pd.read_fwf(os.path.join(model_dir, \"input\", \"zone.term\"), header = None)\n        df_zone_term.columns = [\"taz\", \"terminal_time\"]\n\n        df_agg = pd.merge(df_zone_term, df_clusters, on = \"taz\", how = 'left')\n        df_zones_agg = df_agg.groupby([\"cluster_id\"])['terminal_time'].max().reset_index()\n\n        df_zones_agg.columns = [\"taz\", \"terminal_time\"]\n        df_zones_agg.to_fwf(os.path.join(rsm_dir, \"input\", \"zone.term\"))\n\n    else:\n        raise FileNotFoundError(\"zone.term\")\n\n    # aggregating zones.park\n    if \"zones.park\" in input_files:\n        logging.info(\"Aggregating - zone.park\")\n        df_zones_park = pd.read_fwf(os.path.join(model_dir, \"input\", \"zone.park\"), header = None)\n        df_zones_park.columns = [\"taz\", \"park_zones\"]\n\n        df_zones_park_agg = pd.merge(df_zones_park, df_clusters, on = \"taz\", how = 'left')\n        df_zones_park_agg = df_zones_park_agg.groupby([\"cluster_id\"])['park_zones'].max().reset_index()\n        df_zones_park_agg.columns = [\"taz\", \"park_zones\"]\n        df_zones_park_agg.to_fwf(os.path.join(rsm_dir, \"input\", \"zone.park\"))\n\n    else:\n        raise FileNotFoundError(\"zone.park\")\n\n\n    # aggregating tap.ptype \n    if \"tap.ptype\" in input_files:\n        logging.info(\"Aggregating - tap.ptype\")\n        df_tap_ptype = pd.read_fwf(os.path.join(model_dir, \"input\", \"tap.ptype\"), header = None)\n        df_tap_ptype.columns = [\"tap\", \"lot id\", \"parking type\", \"taz\", \"capacity\", \"distance\", \"transit mode\"]\n\n        df_tap_ptype = pd.merge(df_tap_ptype, df_clusters, on = \"taz\", how = 'left')\n\n        df_tap_ptype = df_tap_ptype[[\"tap\", \"lot id\", \"parking type\", \"cluster_id\", \"capacity\", \"distance\", \"transit mode\"]]\n        df_tap_ptype = df_tap_ptype.rename(columns = {\"cluster_id\": \"taz\"})\n        #df_tap_ptype.to_fwf(os.path.join(rsm_dir, \"input\", \"tap.ptype\"))\n\n        widths = [5, 6, 6, 5, 5, 5, 3]\n\n        with open(os.path.join(rsm_dir, \"input\", \"tap.ptype\"), 'w') as f:\n            for index, row in df_tap_ptype.iterrows():\n                field1 = str(row[0]).rjust(widths[0])\n                field2 = str(row[1]).rjust(widths[1])\n                field3 = str(row[2]).rjust(widths[2])\n                field4 = str(row[3]).rjust(widths[3])\n                field5 = str(row[4]).rjust(widths[4])\n                field6 = str(row[5]).rjust(widths[5])\n                field7 = str(row[6]).rjust(widths[6])\n                f.write(f'{field1}{field2}{field3}{field4}{field5}{field6}{field7}\\n')\n\n    else:\n        raise FileNotFoundError(\"tap.ptype\")\n\n    #aggregating accessam.csv\n    if \"accessam.csv\" in input_files:\n        logging.info(\"Aggregating - accessam.csv\")\n        df_acc = pd.read_csv(os.path.join(model_dir, \"input\", \"accessam.csv\"), header = None)\n        df_acc.columns = ['TAZ', 'TAP', 'TIME', 'DISTANCE', 'MODE']\n\n        df_acc['TAZ'] = df_acc['TAZ'].map(dict_clusters)\n        df_acc_agg = df_acc.groupby(['TAZ', 'TAP', 'MODE'])['TIME', 'DISTANCE'].mean().reset_index()\n        df_acc_agg = df_acc_agg[[\"TAZ\", \"TAP\", \"TIME\", \"DISTANCE\", \"MODE\"]]\n\n        df_acc_agg.to_csv(os.path.join(rsm_dir, \"input\", \"accessam.csv\"), index = False, header =False)\n    else:\n        raise FileNotFoundError(\"accessam.csv\")\n\n    # aggregating ParkLocationAlts.csv\n    if \"ParkLocationAlts.csv\" in input_files:\n        logging.info(\"Aggregating - ParkLocationAlts.csv\")\n        df_park = pd.read_csv(os.path.join(model_dir, \"uec\", \"ParkLocationAlts.csv\"))\n        df_park['mgra_new'] = df_park[\"mgra\"].map(mgra_cwk)\n        df_park_agg = df_park.groupby([\"mgra_new\"])[\"parkarea\"].min().reset_index() # assuming 1 is \"parking\" and 2 is \"no parking\"\n        df_park_agg['a'] = [i+1 for i in range(len(df_park_agg))]\n\n        df_park_agg.columns = [\"a\", \"mgra\", \"parkarea\"]\n        df_park_agg.to_csv(os.path.join(rsm_dir, \"uec\", \"ParkLocationAlts.csv\"), index = False)\n\n    else:\n        FileNotFoundError(\"ParkLocationAlts.csv\")\n\n    # aggregating CrossBorderDestinationChoiceSoaAlternatives.csv\n    if \"CrossBorderDestinationChoiceSoaAlternatives.csv\" in input_files:\n        logging.info(\"Aggregating - CrossBorderDestinationChoiceSoaAlternatives.csv\")\n        df_cb = pd.read_csv(os.path.join(model_dir, \"uec\",\"CrossBorderDestinationChoiceSoaAlternatives.csv\"))\n\n        df_cb[\"mgra_entry\"] = df_cb[\"mgra_entry\"].map(mgra_cwk)\n        df_cb[\"mgra_return\"] = df_cb[\"mgra_return\"].map(mgra_cwk)\n        df_cb[\"a\"] = df_cb[\"a\"].map(mgra_cwk)\n\n        df_cb = pd.merge(df_cb, df_clusters, left_on = \"dest\", right_on = \"taz\", how = 'left')\n        df_cb = df_cb.drop(columns = [\"dest\", \"taz\"])\n        df_cb = df_cb.rename(columns = {'cluster_id' : 'dest'})\n\n        df_cb_final  = df_cb.drop_duplicates()\n\n        df_cb_final = df_cb_final[[\"a\", \"dest\", \"poe\", \"mgra_entry\", \"mgra_return\", \"poe_taz\"]]\n        df_cb_final.to_csv(os.path.join(rsm_dir, \"uec\", \"CrossBorderDestinationChoiceSoaAlternatives.csv\"), index = False)\n\n    else:\n        FileNotFoundError(\"CrossBorderDestinationChoiceSoaAlternatives.csv\")\n\n    # aggregating households.csv\n    if \"households.csv\" in input_files:\n        logging.info(\"Aggregating - households.csv\")\n        df_hh = pd.read_csv(os.path.join(model_dir, \"input\", \"households.csv\"))\n        df_hh[\"mgra\"] = df_hh[\"mgra\"].map(mgra_cwk)\n        df_hh[\"taz\"] = df_hh[\"taz\"].map(dict_clusters)\n\n        df_hh.to_csv(os.path.join(rsm_dir, \"input\", \"households.csv\"), index = False)\n\n    else:\n        FileNotFoundError(\"households.csv\")\n\n    # aggregating ShadowPricingOutput_school_9.csv\n    if \"ShadowPricingOutput_school_9.csv\" in input_files:\n        logging.info(\"Aggregating - ShadowPricingOutput_school_9.csv\")\n        df_sp_sch = pd.read_csv(os.path.join(model_dir, \"input\", \"ShadowPricingOutput_school_9.csv\"))\n\n        agg_instructions = {}\n        for col in df_sp_sch.columns:\n            if \"size\" in col:\n                agg_instructions.update({col: \"sum\"})\n\n            if \"shadowPrices\" in col:\n                agg_instructions.update({col: \"max\"})\n\n            if \"_origins\" in col:\n                agg_instructions.update({col: \"sum\"})\n\n            if \"_modeledDests\" in col:\n                agg_instructions.update({col: \"sum\"})\n\n        df_sp_sch['mgra'] = df_sp_sch['mgra'].map(mgra_cwk)\n        df_sp_sch_agg = df_sp_sch.groupby(['mgra']).agg(agg_instructions).reset_index()\n\n        alt = list(df_sp_sch_agg['mgra'])\n        df_sp_sch_agg.insert(loc=0, column=\"alt\", value=alt)\n        df_sp_sch_agg.loc[len(df_sp_agg.index)] = 0\n\n        df_sp_sch_agg.to_csv(os.path.join(rsm_dir, \"input\", \"ShadowPricingOutput_school_9.csv\"), index=False)\n\n    else:\n        FileNotFoundError(\"ShadowPricingOutput_school_9.csv\")\n\n    # aggregating ShadowPricingOutput_work_9.csv\n    if \"ShadowPricingOutput_work_9.csv\" in input_files:\n        logging.info(\"Aggregating - ShadowPricingOutput_work_9.csv\")\n        df_sp_wrk = pd.read_csv(os.path.join(model_dir, \"input\", \"ShadowPricingOutput_work_9.csv\"))\n\n        agg_instructions = {}\n        for col in df_sp_wrk.columns:\n            if \"size\" in col:\n                agg_instructions.update({col: \"sum\"})\n\n            if \"shadowPrices\" in col:\n                agg_instructions.update({col: \"max\"})\n\n            if \"_origins\" in col:\n                agg_instructions.update({col: \"sum\"})\n\n            if \"_modeledDests\" in col:\n                agg_instructions.update({col: \"sum\"})\n\n        df_sp_wrk['mgra'] = df_sp_wrk['mgra'].map(mgra_cwk)\n\n        df_sp_wrk_agg = df_sp_wrk.groupby(['mgra']).agg(agg_instructions).reset_index()\n\n        alt = list(df_sp_wrk_agg['mgra'])\n        df_sp_wrk_agg.insert(loc=0, column=\"alt\", value=alt)\n\n        df_sp_wrk_agg.loc[len(df_sp_wrk_agg.index)] = 0\n\n        df_sp_wrk_agg.to_csv(os.path.join(rsm_dir, \"input\", \"ShadowPricingOutput_work_9.csv\"), index=False)\n\n    else:\n        FileNotFoundError(\"ShadowPricingOutput_work_9.csv\")\n\n    if \"TourDcSoaDistanceAlts.csv\" in input_files:\n        logging.info(\"Aggregating - TourDcSoaDistanceAlts.csv\")\n        df_TourDcSoaDistanceAlts = pd.DataFrame({\"a\" : range(1,taz_zones+1), \"dest\" : range(1, taz_zones+1)})\n        df_TourDcSoaDistanceAlts.to_csv(os.path.join(rsm_dir, \"uec\", \"TourDcSoaDistanceAlts.csv\"), index=False)\n\n    if \"DestinationChoiceAlternatives.csv\" in input_files:\n        logging.info(\"Aggregating - DestinationChoiceAlternatives.csv\")\n        df_DestinationChoiceAlternatives = pd.DataFrame({\"a\" : range(1,mgra_zones+1), \"mgra\" : range(1, mgra_zones+1)})\n        df_DestinationChoiceAlternatives.to_csv(os.path.join(rsm_dir, \"uec\", \"DestinationChoiceAlternatives.csv\"), index=False)\n\n    if \"SoaTazDistAlts.csv\" in input_files:\n        logging.info(\"Aggregating - SoaTazDistAlts.csv\")\n        df_SoaTazDistAlts = pd.DataFrame({\"a\" : range(1,taz_zones+1), \"dest\" : range(1, taz_zones+1)})\n        df_SoaTazDistAlts.to_csv(os.path.join(rsm_dir, \"uec\", \"SoaTazDistAlts.csv\"), index=False)\n\n    if \"TripMatrices.csv\" in input_files:\n        logging.info(\"Aggregating - TripMatrices.csv\")\n        trips = pd.read_csv(os.path.join(model_dir,\"output\", \"TripMatrices.csv\"))\n        trips['i'] = trips['i'].map(dict_clusters)\n        trips['j'] = trips['j'].map(dict_clusters)\n\n        cols = list(trips.columns)\n        cols.remove(\"i\")\n        cols.remove(\"j\")\n\n        trips_df = trips.groupby(['i', 'j'])[cols].sum().reset_index()\n        trips_df.to_csv(os.path.join(rsm_dir, \"output\", \"TripMatrices.csv\"), index = False)\n\n    else:\n        FileNotFoundError(\"TripMatrices.csv\")\n\n    if \"transponderModelAccessibilities.csv\" in input_files:\n        logging.info(\"Aggregating - transponderModelAccessibilities.csv\")\n        tran_access = pd.read_csv(os.path.join(model_dir, \"output\", \"transponderModelAccessibilities.csv\"))\n        tran_access['TAZ'] = tran_access['TAZ'].map(dict_clusters)\n\n        tran_access_agg = tran_access.groupby(['TAZ'])['DIST','AVGTTS','PCTDETOUR'].mean().reset_index()\n        tran_access_agg.to_csv(os.path.join(rsm_dir, \"output\",\"transponderModelAccessibilities.csv\"), index = False)\n\n    else:\n        raise FileNotFoundError(\"transponderModelAccessibilities.csv\")\n\n    if \"crossBorderTours.csv\" in input_files:\n        logging.info(\"Aggregating - crossBorderTours.csv\")\n        df = pd.read_csv(os.path.join(model_dir, \"output\", \"crossBorderTours.csv\"))\n        df['originMGRA'] = df['originMGRA'].map(mgra_cwk)\n        df['destinationMGRA'] = df['destinationMGRA'].map(mgra_cwk)\n\n        df['originTAZ'] = df['originTAZ'].map(dict_clusters)\n        df['destinationTAZ'] = df['destinationTAZ'].map(dict_clusters)\n        df.to_csv(os.path.join(rsm_dir, \"output\", \"crossBorderTours.csv\"), index = False)\n\n    else:\n        raise FileNotFoundError(\"crossBorderTours.csv\")\n\n    if \"crossBorderTrips.csv\" in input_files:\n        logging.info(\"Aggregating - crossBorderTrips.csv\")\n        df = pd.read_csv(os.path.join(model_dir, \"output\", \"crossBorderTrips.csv\"))\n        df['originMGRA'] = df['originMGRA'].map(mgra_cwk)\n        df['destinationMGRA'] = df['destinationMGRA'].map(mgra_cwk)\n\n        df['originTAZ'] = df['originTAZ'].map(dict_clusters)\n        df['destinationTAZ'] = df['destinationTAZ'].map(dict_clusters)\n        df.to_csv(os.path.join(rsm_dir, \"output\", \"crossBorderTrips.csv\"), index = False)\n\n    else:\n        raise FileNotFoundError(\"crossBorderTrips.csv\")\n\n    if \"internalExternalTrips.csv\" in input_files:\n        logging.info(\"Aggregating - internalExternalTrips.csv\")\n        df = pd.read_csv(os.path.join(model_dir, \"output\", \"internalExternalTrips.csv\"))\n        df['originMGRA'] = df['originMGRA'].map(mgra_cwk)\n        df['destinationMGRA'] = df['destinationMGRA'].map(mgra_cwk)\n\n        df['originTAZ'] = df['originTAZ'].map(dict_clusters)\n        df['destinationTAZ'] = df['destinationTAZ'].map(dict_clusters)\n        df.to_csv(os.path.join(rsm_dir, \"output\", \"internalExternalTrips.csv\"), index = False)\n\n    else:\n        raise FileNotFoundError(\"internalExternalTrips.csv\")\n\n    if \"visitorTours.csv\" in input_files:\n        logging.info(\"Aggregating - visitorTours.csv\")\n        df = pd.read_csv(os.path.join(model_dir, \"output\", \"visitorTours.csv\"))\n\n        df['originMGRA'] = df['originMGRA'].map(mgra_cwk)\n        df['destinationMGRA'] = df['destinationMGRA'].map(mgra_cwk)\n\n        df.to_csv(os.path.join(rsm_dir, \"output\", \"visitorTours.csv\"), index = False)\n\n    else:\n        raise FileNotFoundError(\"visitorTours.csv\")\n\n    if \"visitorTrips.csv\" in input_files:\n        logging.info(\"Aggregating - visitorTrips.csv\")\n        df = pd.read_csv(os.path.join(model_dir, \"output\", \"visitorTrips.csv\"))\n\n        df['originMGRA'] = df['originMGRA'].map(mgra_cwk)\n        df['destinationMGRA'] = df['destinationMGRA'].map(mgra_cwk)\n\n        df.to_csv(os.path.join(rsm_dir, \"output\", \"visitorTrips.csv\"), index = False)\n\n    else:\n        raise FileNotFoundError(\"visitorTrips.csv\")\n\n    if \"householdAVTrips.csv\" in input_files:\n        logging.info(\"Aggregating - householdAVTrips.csv\")\n        df = pd.read_csv(os.path.join(model_dir, \"output\", \"householdAVTrips.csv\"))\n        #print(os.path.join(model_dir, \"output\", \"householdAVTrips.csv\"))\n        df['orig_mgra'] = df['orig_mgra'].map(mgra_cwk)\n        df['dest_gra'] = df['dest_gra'].map(mgra_cwk)\n\n        df['trip_orig_mgra'] = df['trip_orig_mgra'].map(mgra_cwk)\n        df['trip_dest_mgra'] = df['trip_dest_mgra'].map(mgra_cwk)\n        df.to_csv(os.path.join(rsm_dir, \"output\", \"householdAVTrips.csv\"), index = False)\n\n    else:\n        raise FileNotFoundError(\"householdAVTrips.csv\")\n\n    if \"airport_out.CBX.csv\" in input_files:\n        logging.info(\"Aggregating - airport_out.CBX.csv\")\n        df = pd.read_csv(os.path.join(model_dir, \"output\", \"airport_out.CBX.csv\"))\n        df['originMGRA'] = df['originMGRA'].map(mgra_cwk)\n        df['destinationMGRA'] = df['destinationMGRA'].map(mgra_cwk)\n\n        df['originTAZ'] = df['originTAZ'].map(dict_clusters)\n        df['destinationTAZ'] = df['destinationTAZ'].map(dict_clusters)\n        df.to_csv(os.path.join(rsm_dir, \"output\", \"airport_out.CBX.csv\"), index = False)\n\n    else:\n        raise FileNotFoundError(\"airport_out.CBX.csv\")\n\n    if \"airport_out.SAN.csv\" in input_files:\n        logging.info(\"Aggregating - airport_out.SAN.csv\")\n        df = pd.read_csv(os.path.join(model_dir, \"output\", \"airport_out.SAN.csv\"))\n        df['originMGRA'] = df['originMGRA'].map(mgra_cwk)\n        df['destinationMGRA'] = df['destinationMGRA'].map(mgra_cwk)\n\n        df['originTAZ'] = df['originTAZ'].map(dict_clusters)\n        df['destinationTAZ'] = df['destinationTAZ'].map(dict_clusters)\n        df.to_csv(os.path.join(rsm_dir, \"output\", \"airport_out.SAN.csv\"), index = False)\n\n    else:\n        raise FileNotFoundError(\"airport_out.SAN.csv\")\n\n    if \"TNCtrips.csv\" in input_files:\n        logging.info(\"Aggregating - TNCtrips.csv\")\n        df = pd.read_csv(os.path.join(model_dir, \"output\", \"TNCtrips.csv\"))\n        df['originMgra'] = df['originMgra'].map(mgra_cwk)\n        df['destinationMgra'] = df['destinationMgra'].map(mgra_cwk)\n\n        df['originTaz'] = df['originTaz'].map(dict_clusters)\n        df['destinationTaz'] = df['destinationTaz'].map(dict_clusters)\n        df.to_csv(os.path.join(rsm_dir, \"output\", \"TNCtrips.csv\"), index = False)\n\n    else:\n        raise FileNotFoundError(\"TNCtrips.csv\")\n\n    files = [\"Trip\" + \"_\" + i + \"_\" + j + \".csv\" for i, j in\n                itertools.product([\"FA\", \"GO\", \"IN\", \"RE\", \"SV\", \"TH\", \"WH\"],\n                                   [\"OE\", \"AM\", \"MD\", \"PM\", \"OL\"])]\n\n    for file in files:\n        logging.info(f\"Aggregating - {file}\")\n        df = pd.read_csv(os.path.join(model_dir, \"output\", file))\n        df['I'] = df['I'].map(dict_clusters)\n        df['J'] = df['J'].map(dict_clusters)\n        df['HomeZone'] = df['HomeZone'].map(dict_clusters)\n        df.to_csv(os.path.join(rsm_dir, \"output\",file), index = False)\n
      "},{"location":"api.html#rsm.translate.copy_transit_demand","title":"copy_transit_demand(matrix_names, input_dir='.', output_dir='.')","text":"

      copies the omx transit demand matrix to rsm directory

      "},{"location":"api.html#rsm.translate.copy_transit_demand--parameters","title":"Parameters","text":"

      matrix_names : matrix_names (list) omx matrix filenames to aggregate input_dir : input_dir (Path-like) default \u201c.\u201d output_dir : output_dir (Path-like) default \u201c.\u201d

      "},{"location":"api.html#rsm.translate.copy_transit_demand--returns","title":"Returns","text":"Source code in rsm/translate.py
      def copy_transit_demand(\n    matrix_names,\n    input_dir=\".\",\n    output_dir=\".\"\n):\n\"\"\"\n    copies the omx transit demand matrix to rsm directory\n\n    Parameters\n    ----------\n    matrix_names : matrix_names (list)\n        omx matrix filenames to aggregate\n    input_dir : input_dir (Path-like) \n        default \".\"\n    output_dir : output_dir (Path-like)\n        default \".\"\n\n    Returns\n    -------\n\n    \"\"\"\n\n\n    for mat_name in matrix_names:\n        if '.omx' not in mat_name:\n            mat_name = mat_name + \".omx\"\n\n        input_file_dir = os.path.join(input_dir, mat_name)\n        output_file_dir = os.path.join(output_dir, mat_name)\n\n        shutil.copy(input_file_dir, output_file_dir)\n
      "},{"location":"api.html#rsm.translate.translate_emmebank_demand","title":"translate_emmebank_demand(input_databank, output_databank, cores_to_aggregate, agg_zone_mapping)","text":"

      aggregates the demand matrix cores from one emme databank and loads them into another databank

      "},{"location":"api.html#rsm.translate.translate_emmebank_demand--parameters","title":"Parameters","text":"

      input_databank : input_databank (Emme databank) Emme databank output_databank : output_databank (Emme databank) Emme databank cores_to_aggregate : cores_to_aggregate (list) matrix corenames to aggregate agg_zone_mapping: agg_zone_mapping (Path-like or pandas.DataFrame) zone number mapping between original and aggregated zones. columns: original zones as \u2018taz\u2019 and aggregated zones as \u2018cluster_id\u2019

      "},{"location":"api.html#rsm.translate.translate_emmebank_demand--returns","title":"Returns","text":"

      None. Loads the trip matrices into emmebank.

      Source code in rsm/translate.py
      def translate_emmebank_demand(\n    input_databank,\n    output_databank,\n    cores_to_aggregate,\n    agg_zone_mapping,\n): \n\"\"\"\n    aggregates the demand matrix cores from one emme databank and loads them into another databank\n\n    Parameters\n    ----------\n    input_databank : input_databank (Emme databank)\n        Emme databank\n    output_databank : output_databank (Emme databank)\n        Emme databank\n    cores_to_aggregate : cores_to_aggregate (list)\n        matrix corenames to aggregate\n    agg_zone_mapping: agg_zone_mapping (Path-like or pandas.DataFrame)\n        zone number mapping between original and aggregated zones. \n        columns: original zones as 'taz' and aggregated zones as 'cluster_id'\n\n    Returns\n    -------\n    None. Loads the trip matrices into emmebank.\n\n    \"\"\"\n\n    agg_zone_mapping_df = pd.read_csv(os.path.join(agg_zone_mapping))\n    agg_zone_mapping_df = agg_zone_mapping_df.sort_values('taz')\n\n    agg_zone_mapping_df.columns= agg_zone_mapping_df.columns.str.strip().str.lower()\n    zone_mapping = dict(zip(agg_zone_mapping_df['taz'], agg_zone_mapping_df['cluster_id']))\n\n    for core in cores_to_aggregate: \n        matrix = input_databank.matrix(core).get_data()\n        matrix_array = matrix.to_numpy()\n\n        matrix_agg = _aggregate_matrix(matrix_array, zone_mapping)\n\n        output_matrix = output_databank.matrix(core)\n        output_matrix.set_numpy_data(matrix_agg)\n
      "},{"location":"api.html#rsm.translate.translate_omx_demand","title":"translate_omx_demand(matrix_names, agg_zone_mapping, input_dir='.', output_dir='.')","text":"

      aggregates the omx demand matrix to aggregated zone system

      "},{"location":"api.html#rsm.translate.translate_omx_demand--parameters","title":"Parameters","text":"

      matrix_names : matrix_names (list) omx matrix filenames to aggregate agg_zone_mapping: agg_zone_mapping (path_like or pandas.DataFrame) zone number mapping between original and aggregated zones. columns: original zones as \u2018taz\u2019 and aggregated zones as \u2018cluster_id\u2019 input_dir : input_dir (path_like) default \u201c.\u201d output_dir : output_dir (path_like) default \u201c.\u201d

      "},{"location":"api.html#rsm.translate.translate_omx_demand--returns","title":"Returns","text":"Source code in rsm/translate.py
      def translate_omx_demand(\n    matrix_names,\n    agg_zone_mapping,\n    input_dir=\".\",\n    output_dir=\".\"\n): \n\"\"\"\n    aggregates the omx demand matrix to aggregated zone system\n\n    Parameters\n    ----------\n    matrix_names : matrix_names (list)\n        omx matrix filenames to aggregate\n    agg_zone_mapping: agg_zone_mapping (path_like or pandas.DataFrame)\n        zone number mapping between original and aggregated zones. \n        columns: original zones as 'taz' and aggregated zones as 'cluster_id'\n    input_dir : input_dir (path_like)\n        default \".\"\n    output_dir : output_dir (path_like) \n        default \".\"\n\n    Returns\n    -------\n\n    \"\"\"\n\n    agg_zone_mapping_df = pd.read_csv(os.path.join(agg_zone_mapping))\n    agg_zone_mapping_df = agg_zone_mapping_df.sort_values('taz')\n\n    agg_zone_mapping_df.columns= agg_zone_mapping_df.columns.str.strip().str.lower()\n    zone_mapping = dict(zip(agg_zone_mapping_df['taz'], agg_zone_mapping_df['cluster_id']))\n    agg_zones = sorted(agg_zone_mapping_df['cluster_id'].unique())\n\n    for mat_name in matrix_names:\n        if '.omx' not in mat_name:\n            mat_name = mat_name + \".omx\"\n\n        #logger.info(\"Aggregating Matrix: \" + mat_name + \" ...\")\n\n        input_skim_file = os.path.join(input_dir, mat_name)\n        print(input_skim_file)\n        output_skim_file = os.path.join(output_dir, mat_name)\n\n        assert os.path.isfile(input_skim_file)\n\n        input_matrix = omx.open_file(input_skim_file, mode=\"r\") \n        input_mapping_name = input_matrix.list_mappings()[0]\n        input_cores = input_matrix.list_matrices()\n\n        output_matrix = omx.open_file(output_skim_file, mode=\"w\")\n\n        for core in input_cores:\n            matrix = input_matrix[core]\n            matrix_array = matrix.read()\n            matrix_agg = _aggregate_matrix(matrix_array, zone_mapping)\n            output_matrix[core] = matrix_agg\n\n        output_matrix.create_mapping(title=input_mapping_name, entries=agg_zones)\n\n        input_matrix.close()\n        output_matrix.close()\n
      "},{"location":"api.html#rsm.sampler.rsm_household_sampler","title":"rsm_household_sampler(input_dir='.', output_dir='.', prev_iter_access=None, curr_iter_access=None, study_area=None, input_household='households.csv', input_person='persons.csv', taz_crosswalk='taz_crosswalk.csv', mgra_crosswalk='mgra_crosswalk.csv', compare_access_columns=('NONMAN_AUTO', 'NONMAN_TRANSIT', 'NONMAN_NONMOTOR', 'NONMAN_SOV_0'), default_sampling_rate=0.25, lower_bound_sampling_rate=0.15, upper_bound_sampling_rate=1.0, random_seed=42, output_household='sampled_households.csv', output_person='sampled_person.csv')","text":"

      Take an intelligent sampling of households.

      "},{"location":"api.html#rsm.sampler.rsm_household_sampler--parameters","title":"Parameters","text":"

      input_dir : input_dir (path_like) default \u201c.\u201d output_dir : output_dir (path_like) default \u201c.\u201d prev_iter_access : prev_iter_access (Path-like or pandas.DataFrame) Accessibility in an old (default, no treatment, etc) run is given (preloaded) or read in from here. Give as a relative path (from input_dir) or an absolute path. curr_iter_access : curr_iter_access (Path-like or pandas.DataFrame) Accessibility in the latest run is given (preloaded) or read in from here. Give as a relative path (from input_dir) or an absolute path. study_area : study_area (array-like) Array of RSM zone (these are numbered 1 to N in the RSM) in the study area. These zones are sampled at 100%. input_household : input_household (Path-like or pandas.DataFrame) Complete synthetic household file. This data will be filtered to match the sampling of households and written out to a new CSV file. input_person : input_person (Path-like or pandas.DataFrame) Complete synthetic persons file. This data will be filtered to match the sampling of households and written out to a new CSV file. compare_access_columns : compare_access_columns (Collection[str]) Column names in the accessibility file to use for comparing accessibility. Only changes in the values in these columns will be evaluated. default_sampling_rate : default_sampling_rate (float) The default sampling rate, in the range (0,1] lower_bound_sampling_rate : lower_bound_sampling_rate (float) Sampling rates by zone will be truncated so they are never lower than this. upper_bound_sampling_rate : upper_bound_sampling_rate (float) Sampling rates by zone will be truncated so they are never higher than this.

      "},{"location":"api.html#rsm.sampler.rsm_household_sampler--returns","title":"Returns","text":"

      sample_households_df, sample_persons_df : sample_households_df, sample_persons_df (pandas.DataFrame) These are the sampled population to resimulate. They are also written to the output_dir

      Source code in rsm/sampler.py
      def rsm_household_sampler(\n    input_dir=\".\",\n    output_dir=\".\",\n    prev_iter_access=None,\n    curr_iter_access=None,\n    study_area=None,\n    input_household=\"households.csv\",\n    input_person=\"persons.csv\",\n    taz_crosswalk=\"taz_crosswalk.csv\",\n    mgra_crosswalk=\"mgra_crosswalk.csv\",\n    compare_access_columns=(\n        \"NONMAN_AUTO\",\n        \"NONMAN_TRANSIT\",\n        \"NONMAN_NONMOTOR\",\n        \"NONMAN_SOV_0\",\n    ),\n    default_sampling_rate=0.25,  # fix the values of this after some testing\n    lower_bound_sampling_rate=0.15,  # fix the values of this after some testing\n    upper_bound_sampling_rate=1.0,  # fix the values of this after some testing\n    random_seed=42,\n    output_household=\"sampled_households.csv\",\n    output_person=\"sampled_person.csv\",\n):\n\"\"\"\n    Take an intelligent sampling of households.\n\n    Parameters\n    ----------\n    input_dir : input_dir (path_like)\n        default \".\"\n    output_dir : output_dir (path_like)\n        default \".\"\n    prev_iter_access : prev_iter_access (Path-like or pandas.DataFrame)\n        Accessibility in an old (default, no treatment, etc) run is given (preloaded)\n        or read in from here. Give as a relative path (from `input_dir`) or an\n        absolute path.\n    curr_iter_access : curr_iter_access (Path-like or pandas.DataFrame)\n        Accessibility in the latest run is given (preloaded) or read in from here.\n        Give as a relative path (from `input_dir`) or an absolute path.\n    study_area : study_area (array-like)\n        Array of RSM zone (these are numbered 1 to N in the RSM) in the study area. These zones are sampled at 100%.\n    input_household : input_household (Path-like or pandas.DataFrame)\n        Complete synthetic household file.  This data will be filtered to match the\n        sampling of households and written out to a new CSV file.\n    input_person : input_person (Path-like or pandas.DataFrame)\n        Complete synthetic persons file.  This data will be filtered to match the\n        sampling of households and written out to a new CSV file.\n    compare_access_columns : compare_access_columns (Collection[str])\n        Column names in the accessibility file to use for comparing accessibility.\n        Only changes in the values in these columns will be evaluated.\n    default_sampling_rate : default_sampling_rate (float)\n        The default sampling rate, in the range (0,1]\n    lower_bound_sampling_rate : lower_bound_sampling_rate (float)\n        Sampling rates by zone will be truncated so they are never lower than this.\n    upper_bound_sampling_rate : upper_bound_sampling_rate (float)\n        Sampling rates by zone will be truncated so they are never higher than this.\n\n    Returns\n    -------\n    sample_households_df, sample_persons_df : sample_households_df, sample_persons_df (pandas.DataFrame)\n        These are the sampled population to resimulate.  They are also written to\n        the output_dir\n    \"\"\"\n\n    input_dir = Path(input_dir or \".\")\n    output_dir = Path(output_dir or \".\")\n\n    logger.debug(\"CALL rsm_household_sampler\")\n    logger.debug(f\"  {input_dir=}\")\n    logger.debug(f\"  {output_dir=}\")\n\n    def _resolve_df(x, directory, make_index=None):\n        if isinstance(x, (str, Path)):\n            # read in the file to a pandas DataFrame\n            x = Path(x).expanduser()\n            if not x.is_absolute():\n                x = Path(directory or \".\").expanduser().joinpath(x)\n            try:\n                result = pd.read_csv(x)\n            except FileNotFoundError:\n                raise\n        elif isinstance(x, pd.DataFrame):\n            result = x\n        elif x is None:\n            result = None\n        else:\n            raise TypeError(\"must be path-like or DataFrame\")\n        if (\n            result is not None\n            and make_index is not None\n            and make_index in result.columns\n        ):\n            result = result.set_index(make_index)\n        return result\n\n    def _resolve_out_filename(x):\n        x = Path(x).expanduser()\n        if not x.is_absolute():\n            x = Path(output_dir).expanduser().joinpath(x)\n        x.parent.mkdir(parents=True, exist_ok=True)\n        return x\n\n    prev_iter_access_df = _resolve_df(\n        prev_iter_access, input_dir, make_index=\"MGRA\"\n    )\n    curr_iter_access_df = _resolve_df(\n        curr_iter_access, input_dir, make_index=\"MGRA\"\n    )\n    rsm_zones = _resolve_df(taz_crosswalk, input_dir)\n    dict_clusters = dict(zip(rsm_zones[\"taz\"], rsm_zones[\"cluster_id\"]))\n\n    rsm_mgra_zones = _resolve_df(mgra_crosswalk, input_dir)\n    rsm_mgra_zones.columns = rsm_mgra_zones.columns.str.strip().str.lower()\n    dict_clusters_mgra = dict(zip(rsm_mgra_zones[\"mgra\"], rsm_mgra_zones[\"cluster_id\"]))\n\n    # changing the taz and mgra to new cluster ids\n    input_household_df = _resolve_df(input_household, input_dir)\n    input_household_df[\"taz\"] = input_household_df[\"taz\"].map(dict_clusters)\n    input_household_df[\"mgra\"] = input_household_df[\"mgra\"].map(dict_clusters_mgra)\n    input_household_df[\"count\"] = 1\n\n    mgra_hh = input_household_df.groupby([\"mgra\"]).size().rename(\"n_hh\").to_frame()\n\n    if curr_iter_access_df is None or prev_iter_access_df is None:\n\n        if curr_iter_access_df is None:\n            logger.warning(f\"missing curr_iter_access_df from {curr_iter_access}\")\n        if prev_iter_access_df is None:\n            logger.warning(f\"missing prev_iter_access_df from {prev_iter_access}\")\n        # true when sampler is turned off. default_sampling_rate should be set to 1\n\n        mgra_hh[\"sampling_rate\"] = default_sampling_rate\n        if study_area is not None:\n            mgra_hh.loc[mgra_hh.index.isin(study_area), \"sample_rate\"] = 1\n\n        sample_households = []\n\n        for mgra_id, row in mgra_hh.iterrows():\n            df = input_household_df.loc[input_household_df[\"mgra\"] == mgra_id]\n            sampling_rate = row[\"sampling_rate\"]\n            logger.info(f\"Sampling rate of RSM zone {mgra_id}: {sampling_rate}\")\n            df = df.sample(frac=sampling_rate, random_state=mgra_id + random_seed)\n            sample_households.append(df)\n\n        # combine study are and non-study area households into single dataframe\n        sample_households_df = pd.concat(sample_households)\n\n    else:\n        # restrict to rows only where TAZs have households\n        prev_iter_access_df = prev_iter_access_df[\n            prev_iter_access_df.index.isin(mgra_hh.index)\n        ].copy()\n        curr_iter_access_df = curr_iter_access_df[\n            curr_iter_access_df.index.isin(mgra_hh.index)\n        ].copy()\n\n        # compare accessibility columns\n        compare_results = pd.DataFrame()\n\n        for column in compare_access_columns:\n            compare_results[column] = (\n                curr_iter_access_df[column] - prev_iter_access_df[column]\n            ).abs()  # take absolute difference\n        compare_results[\"MGRA\"] = prev_iter_access_df.index\n\n        compare_results = compare_results.set_index(\"MGRA\")\n\n        # Take row sums of all difference\n        compare_results[\"Total\"] = compare_results[list(compare_access_columns)].sum(\n            axis=1\n        )\n\n        # TODO: potentially adjust this later after we figure out a better approach\n        wgts = compare_results[\"Total\"] + 0.01\n        wgts /= wgts.mean() / default_sampling_rate\n        compare_results[\"sampling_rate\"] = np.clip(\n            wgts, lower_bound_sampling_rate, upper_bound_sampling_rate\n        )\n\n        sample_households = []\n        sample_rate_df = compare_results[[\"sampling_rate\"]].copy()\n        if study_area is not None:\n            sample_rate_df.loc[\n                sample_rate_df.index.isin(study_area), \"sampling_rate\"\n            ] = 1\n\n        for mgra_id, row in sample_rate_df.iterrows():\n            df = input_household_df.loc[input_household_df[\"mgra\"] == mgra_id]\n            sampling_rate = row[\"sampling_rate\"]\n            logger.info(f\"Sampling rate of RSM zone {mgra_id}: {sampling_rate}\")\n            df = df.sample(frac=sampling_rate, random_state=mgra_id + random_seed)\n            sample_households.append(df)\n\n        # combine study are and non-study area households into single dataframe\n        sample_households_df = pd.concat(sample_households)\n\n    sample_households_df = sample_households_df.sort_values(by=[\"hhid\"])\n    sample_households_df.to_csv(_resolve_out_filename(output_household), index=False)\n\n    # select persons belonging to sampled households\n    sample_hhids = sample_households_df[\"hhid\"].to_numpy()\n\n    persons_df = _resolve_df(input_person, input_dir)\n    sample_persons_df = persons_df.loc[persons_df[\"hhid\"].isin(sample_hhids)]\n    sample_persons_df.to_csv(_resolve_out_filename(output_person), index=False)\n\n    global_sample_rate = round(len(sample_households_df) / len(input_household_df),2)\n    logger.info(f\"Total Sampling Rate : {global_sample_rate}\")\n\n    return sample_households_df, sample_persons_df\n
      "},{"location":"api.html#rsm.assembler.rsm_assemble","title":"rsm_assemble(orig_indiv, orig_joint, rsm_indiv, rsm_joint, households, mgra_crosswalk=None, sample_rate=0.25, run_assembler=1)","text":"

      Assemble and evaluate RSM trip making.

      "},{"location":"api.html#rsm.assembler.rsm_assemble--parameters","title":"Parameters","text":"

      orig_indiv : orig_indiv (path_like) Trips table from \u201coriginal\u201d model run, should be comprehensive simulation of all individual trips for all synthetic households. orig_joint : orig_joint (path_like) Joint trips table from \u201coriginal\u201d model run, should be comprehensive simulation of all joint trips for all synthetic households. rsm_indiv : rsm_indiv (path_like) Trips table from RSM model run, should be a simulation of all individual trips for potentially only a subset of all synthetic households. rsm_joint : rsm_joint (path_like) Trips table from RSM model run, should be a simulation of all joint trips for potentially only a subset of all synthetic households (the same sampled households as in rsm_indiv). households : households (path_like) Synthetic household file, used to get home zones for households. mgra_crosswalk : mgra_crosswalk (path_like, optional) Crosswalk from original MGRA to clustered zone ids. Provide this crosswalk if the orig_indiv and orig_joint files reference the original MGRA system and those id\u2019s need to be converted to aggregated values before merging. sample_rate : sample_rate (float) Default/fixed sample rate if sampler was turned off this is used to scale the trips if run_assembler is 0 run_assembler : run_assembler (boolean) Flag to indicate whether to run RSM assembler or not. 1 is to run assembler, 0 is to turn if off setting this to 0 is only an option if sampler is turned off

      "},{"location":"api.html#rsm.assembler.rsm_assemble--returns","title":"Returns","text":"

      final_trips_rsm : final_ind_trips (pd.DataFrame) Assembled trip table for RSM run, filling in archived trip values for non-resimulated households. combined_trips_by_zone : final_jnt_trips (pd.DataFrame) Summary table of changes in trips by mode, by household home zone. Used to check whether undersampled zones have stable travel behavior.

      Separate tables for individual and joint trips, as required by java.

      Source code in rsm/assembler.py
      def rsm_assemble(\n    orig_indiv,\n    orig_joint,\n    rsm_indiv,\n    rsm_joint,\n    households,\n    mgra_crosswalk=None,\n    sample_rate=0.25,\n    run_assembler=1\n):\n\"\"\"\n    Assemble and evaluate RSM trip making.\n\n    Parameters\n    ----------\n    orig_indiv : orig_indiv (path_like)\n        Trips table from \"original\" model run, should be comprehensive simulation\n        of all individual trips for all synthetic households.\n    orig_joint : orig_joint (path_like)\n        Joint trips table from \"original\" model run, should be comprehensive simulation\n        of all joint trips for all synthetic households.\n    rsm_indiv : rsm_indiv (path_like)\n        Trips table from RSM model run, should be a simulation of all individual\n        trips for potentially only a subset of all synthetic households.\n    rsm_joint : rsm_joint (path_like)\n        Trips table from RSM model run, should be a simulation of all joint\n        trips for potentially only a subset of all synthetic households (the\n        same sampled households as in `rsm_indiv`).\n    households : households (path_like)\n        Synthetic household file, used to get home zones for households.\n    mgra_crosswalk : mgra_crosswalk (path_like, optional)\n        Crosswalk from original MGRA to clustered zone ids.  Provide this crosswalk\n        if the `orig_indiv` and `orig_joint` files reference the original MGRA system\n        and those id's need to be converted to aggregated values before merging.\n    sample_rate : sample_rate (float)\n        Default/fixed sample rate if sampler was turned off\n        this is used to scale the trips if run_assembler is 0\n    run_assembler : run_assembler (boolean)\n        Flag to indicate whether to run RSM assembler or not. \n        1 is to run assembler, 0 is to turn if off\n        setting this to 0 is only an option if sampler is turned off       \n\n    Returns\n    -------\n    final_trips_rsm : final_ind_trips (pd.DataFrame)\n        Assembled trip table for RSM run, filling in archived trip values for\n        non-resimulated households.\n    combined_trips_by_zone : final_jnt_trips (pd.DataFrame)\n        Summary table of changes in trips by mode, by household home zone.\n        Used to check whether undersampled zones have stable travel behavior.\n\n    Separate tables for individual and joint trips, as required by java.\n\n\n    \"\"\"\n    orig_indiv = Path(orig_indiv).expanduser()\n    orig_joint = Path(orig_joint).expanduser()\n    rsm_indiv = Path(rsm_indiv).expanduser()\n    rsm_joint = Path(rsm_joint).expanduser()\n    households = Path(households).expanduser()\n\n    assert os.path.isfile(orig_indiv)\n    assert os.path.isfile(orig_joint)\n    assert os.path.isfile(rsm_indiv)\n    assert os.path.isfile(rsm_joint)\n    assert os.path.isfile(households)\n\n    if mgra_crosswalk is not None:\n        mgra_crosswalk = Path(mgra_crosswalk).expanduser()\n        assert os.path.isfile(mgra_crosswalk)\n\n    # load trip data - partial simulation of RSM model\n    logger.info(\"reading ind_trips_rsm\")\n    ind_trips_rsm = pd.read_csv(rsm_indiv)\n    logger.info(\"reading jnt_trips_rsm\")\n    jnt_trips_rsm = pd.read_csv(rsm_joint)\n\n    if run_assembler == 1:\n        # load trip data - full simulation of residual/source model\n        logger.info(\"reading ind_trips_full\")\n        ind_trips_full = pd.read_csv(orig_indiv)\n        logger.info(\"reading jnt_trips_full\")\n        jnt_trips_full = pd.read_csv(orig_joint)\n\n        if mgra_crosswalk is not None:\n            logger.info(\"applying mgra_crosswalk to original data\")\n            mgra_crosswalk = pd.read_csv(mgra_crosswalk).set_index(\"MGRA\")[\"cluster_id\"]\n            mgra_crosswalk[-1] = -1\n            mgra_crosswalk[0] = 0\n            for col in [c for c in ind_trips_full.columns if c.lower().endswith(\"_mgra\")]:\n                ind_trips_full[col] = ind_trips_full[col].map(mgra_crosswalk)\n            for col in [c for c in jnt_trips_full.columns if c.lower().endswith(\"_mgra\")]:\n                jnt_trips_full[col] = jnt_trips_full[col].map(mgra_crosswalk)\n\n        # convert to rsm trips\n        logger.info(\"convert to common table platform\")\n        rsm_trips = _merge_joint_and_indiv_trips(ind_trips_rsm, jnt_trips_rsm)\n        original_trips = _merge_joint_and_indiv_trips(ind_trips_full, jnt_trips_full)\n\n        logger.info(\"get all hhids in trips produced by RSM\")\n        hh_ids_rsm = rsm_trips[\"hh_id\"].unique()\n\n        logger.info(\"remove orginal model trips made by households chosen in RSM trips\")\n        original_trips_not_resimulated = original_trips.loc[\n            ~original_trips[\"hh_id\"].isin(hh_ids_rsm)\n        ]\n        original_ind_trips_not_resimulated = ind_trips_full[\n            ~ind_trips_full[\"hh_id\"].isin(hh_ids_rsm)\n        ]\n        original_jnt_trips_not_resimulated = jnt_trips_full[\n            ~jnt_trips_full[\"hh_id\"].isin(hh_ids_rsm)\n        ]\n\n        logger.info(\"concatenate trips from rsm and original model\")\n        final_trips_rsm = pd.concat(\n            [rsm_trips, original_trips_not_resimulated], ignore_index=True\n        ).reset_index(drop=True)\n        final_ind_trips = pd.concat(\n            [ind_trips_rsm, original_ind_trips_not_resimulated], ignore_index=True\n        ).reset_index(drop=True)\n        final_jnt_trips = pd.concat(\n            [jnt_trips_rsm, original_jnt_trips_not_resimulated], ignore_index=True\n        ).reset_index(drop=True)\n\n        # Get percentage change in total trips by mode for each home zone\n\n        # extract trips made by households in RSM and Original model\n        original_trips_that_were_resimulated = original_trips.loc[\n            original_trips[\"hh_id\"].isin(hh_ids_rsm)\n        ]\n\n        def _agg_by_hhid_and_tripmode(df, name):\n            return df.groupby([\"hh_id\", \"trip_mode\"]).size().rename(name).reset_index()\n\n        # combining trips by hhid and trip mode\n        combined_trips = pd.merge(\n            _agg_by_hhid_and_tripmode(original_trips_that_were_resimulated, \"n_trips_orig\"),\n            _agg_by_hhid_and_tripmode(rsm_trips, \"n_trips_rsm\"),\n            on=[\"hh_id\", \"trip_mode\"],\n            how=\"outer\",\n            sort=True,\n        ).fillna(0)\n\n        # aggregating by Home zone\n        hh_rsm = pd.read_csv(households)\n        hh_id_col_names = [\"hhid\", \"hh_id\", \"household_id\"]\n        for hhid in hh_id_col_names:\n            if hhid in hh_rsm.columns:\n                break\n        else:\n            raise KeyError(f\"none of {hh_id_col_names!r} in household file\")\n        homezone_col_names = [\"mgra\", \"home_mgra\"]\n        for zoneid in homezone_col_names:\n            if zoneid in hh_rsm.columns:\n                break\n        else:\n            raise KeyError(f\"none of {homezone_col_names!r} in household file\")\n        hh_rsm = hh_rsm[[hhid, zoneid]]\n\n        # attach home zone id\n        combined_trips = pd.merge(\n            combined_trips, hh_rsm, left_on=\"hh_id\", right_on=hhid, how=\"left\"\n        )\n\n        combined_trips_by_zone = (\n            combined_trips.groupby([zoneid, \"trip_mode\"])[[\"n_trips_orig\", \"n_trips_rsm\"]]\n            .sum()\n            .reset_index()\n        )\n\n        combined_trips_by_zone = combined_trips_by_zone.eval(\n            \"net_change = (n_trips_rsm - n_trips_orig)\"\n        )\n\n        combined_trips_by_zone[\"max_trips\"] = np.fmax(\n            combined_trips_by_zone.n_trips_rsm, combined_trips_by_zone.n_trips_orig\n        )\n        combined_trips_by_zone = combined_trips_by_zone.eval(\n            \"pct_change = net_change / max_trips * 100\"\n        )\n        combined_trips_by_zone = combined_trips_by_zone.drop(columns=\"max_trips\")\n    else:\n        # if assembler is set to be turned off\n        # then scale the trips in the trip list using the fixed sample rate \n        # trips in the final trip lists will be 100%\n        scale_factor = int(1.0/sample_rate)\n\n        # concat is slow\n        # https://stackoverflow.com/questions/50788508/how-can-i-replicate-rows-of-a-pandas-dataframe\n        #final_ind_trips = pd.concat([ind_trips_rsm]*scale_factor, ignore_index=True)\n        #final_jnt_trips = pd.concat([jnt_trips_rsm]*scale_factor, ignore_index=True)\n\n        final_ind_trips = pd.DataFrame(\n            np.repeat(ind_trips_rsm.values, scale_factor, axis=0),\n            columns=ind_trips_rsm.columns\n        )\n\n        final_jnt_trips = pd.DataFrame(\n            np.repeat(jnt_trips_rsm.values, scale_factor, axis=0),\n            columns=jnt_trips_rsm.columns\n        )        \n\n    return final_ind_trips, final_jnt_trips\n
      "},{"location":"assessment.html","title":"Assessment","text":""},{"location":"assessment.html#rsm-configuration","title":"RSM Configuration","text":"

      Different scenario runs with varying configurations were done during the RSM development to then select a final set of configuration parameters to move forward with the overall assessment of RSM.

      TODO: Include table with different configurations and corresponding run time.

      "},{"location":"assessment.html#calibration","title":"Calibration","text":"

      Aggregating the ABM zones to RSM zones, distorts the walk trips share coming out of the model. With the model configuration (Rapid Zones, Global Iterations, Sample Rate, etc.) for RSM as identified above, tour mode choice calibration was performed to match the RSM mode share to ABM2+ mode share, primarily to match the walk trips. A calibration constant was applied to the tour mode choice UEC to School, Maintenance, Discretionary tour purpose. The mode share for Work and University purpsoe were reasonable, therefore the calibration wasn\u2019t applied to those purposes.

      Note that a minor calibration will be required for RSM when number of rapid zones are changed.

      Here is how the mode share and VMT compares before and after the calibration for RSM. Donor model in the charts below refers to the ABM2+ run.

      "},{"location":"assessment.html#base-year-validation","title":"Base Year Validation","text":"

      Here is the table of ABM2+ and RSM outcome comparison after the RSM calibration. The metrics used are some of the regional level key metrics. Volume comparison for the roadway segment on I-5 and I-8 were chosen at random.

      "},{"location":"assessment.html#runtime-comparison","title":"Runtime Comparison","text":"

      For base year 2016 simulation, below is the runtime comparison of ABM2+ vs RSM.

      "},{"location":"assessment.html#sensitivity-testing","title":"Sensitivity Testing","text":"

      After validating the RSM for base year with the chosen design configuration, RSM was used to carry out hypothetical planning studies related to some broader use-cases. Model results from both RSM and ABM2+ were compared for each of the sensitivity test to assess the performance of RSM and evaluate if RSM could be a viable tool for such policy planning.

      For each test, a few key metrics from ABM2+ No Action, ABM2+ Action, RSM No Action and RSM Action scenario runs were compared. The goal was to have RSM and ABM2+ show similar sensitivities for action vs no-action.

      "},{"location":"assessment.html#regional-highway-changes","title":"Regional Highway Changes","text":""},{"location":"assessment.html#auto-operating-cost-50-increase","title":"Auto Operating Cost - 50% Increase","text":""},{"location":"assessment.html#auto-operating-cost-50-decrease","title":"Auto Operating Cost - 50% Decrease","text":""},{"location":"assessment.html#ride-hailing-cost-50-decrease","title":"Ride Hailing Cost - 50% decrease","text":""},{"location":"assessment.html#automated-vehicles-100-adoption","title":"Automated Vehicles - 100% Adoption","text":"

      In SANDAG model, the AV adoption is analyzed by capturing the zero occupancy vehicle movement as simulated in the Household AV Allocation module. For RSM, this AV allocation module is skipped, which is why RSM is not a viable tool for evaluating policies related to automated vehicles.

      "},{"location":"assessment.html#land-use-changes","title":"Land Use Changes","text":"

      RSM and ABM2+ shows similar sensitivities for the two tested scenarios with land use change.

      "},{"location":"assessment.html#change-in-land-use-job-housing-balance","title":"Change in land use - Job Housing Balance","text":""},{"location":"assessment.html#change-in-land-use-mixed-land-use","title":"Change in land use - Mixed Land Use","text":""},{"location":"assessment.html#regional-transit-changes","title":"Regional Transit Changes","text":""},{"location":"assessment.html#transit-fare","title":"Transit Fare","text":"

      TODO: Add some text to explain how this test was performed using the study area parameter TODO: Add outcome screenshot

      "},{"location":"assessment.html#transit-frequency","title":"Transit Frequency","text":"

      TODO: Add some text to explain how this test was performed using the study area parameter TODO: Add outcome screenshot

      "},{"location":"assessment.html#local-highway-changes","title":"Local Highway Changes","text":""},{"location":"assessment.html#managed-lane-conversion","title":"Managed Lane Conversion","text":"

      TODO: Add some text to explain how this test was performed using the study area parameter TODO: Add outcome screenshot

      "},{"location":"assessment.html#local-transit-changes","title":"Local Transit Changes","text":""},{"location":"assessment.html#rapid-637-brt","title":"Rapid 637 BRT","text":"

      TODO: Add some text to explain how this test was performed using the study area parameter TODO: Add outcome screenshot

      "},{"location":"assessment.html#use-cases-and-key-limitations","title":"Use Cases and Key Limitations","text":"

      Based on set of tests done as part of this project, RSM performs well for regional scale roadway projects (e.g., auto operating costs and mileage fee, TNC costs and wait times etc.) and regional scale transit projects (transit fare, headway changes etc.). RSM also performed well for land-use change policies. Lastly, RSM was also tested for local roadway changes (e.g., managed lanes conversion) and local transit changes (e.g., new BRT line), and the results indicate that those policies are reasonably represented by RSM as well.

      Here are some of the current limitations of RSM:

      • The scope of the RSM is \u201cpassenger\u201d travel. Policies and/or infrastructure that primarily impact commercial travel (e.g., truck lanes) will not be well represented.
      • Minor re-calibration of the mode choice was necessary to match observed walk trips. Large changes to the number of zones will likely require recalibration.
      • The spatial aggregation reduces the RSM\u2019s ability to represent to simulate infrastructure and/or policies that act at small scales (e.g., pedestrian infrastructure).
      • Policies related to the adoption of automated vehicles cannot be currently represented. RSM currently skips running the Household AV Allocation module.
      • While the RSM has been tested, the testing has not been extensive. More extensive testing is likely to surface additional issues. Additional testing will be required to evaluate if RSM can be a viable tool for other policies that interests SANDAG.
      "},{"location":"development.html","title":"Development","text":""},{"location":"development.html#needs","title":"Needs","text":"

      The time needed to configure, run, and summarize results from ABM2+ is too slow to support a nimble, challenging, and engagement-oriented planning process. SANDAG needed a tool that quickly approximates the outcomes of ABM2+. The rapid strategic model, or RSM, was built for this purpose.

      ABM2+ Schematic is shown below

      "},{"location":"development.html#design-considerations","title":"Design Considerations","text":"

      Reducing the number of zones reduces model runtime.

      • MGRAs are aggregated into Rapid Zones based on their proximity to each other and similarity in regards to mode choice decisions.
      • RSM will have variable number of analysis zones and that can be quickly changed to assess trade-offs between runtime and how well the RSM results match the ABM2+ results.
      • Initial testing revealed 2,000 rapid zones is approximately optimal and will be used in initial deployments. For reference, ABM2+ has ~23,000 MGRAs and ~5,000 TAZs.

      Reducing the number of model components reduces runtime.

      • Most, but not all, of the policies of interest to SANDAG primarily impact resident passenger travel.
      • Therefore, RSM will only run passenger travel component while maintaining the other demand components fixed.

      Reducing the number of global iterations reduces runtime.

      • If the RSM results are in the same ballpark as ABM2+, reduce the number of global iterations from 3 to 2 for the model.

      Reducing sample rate reduces runtime.

      • Runtime of the resident model will reduce if less population is simulated.
      • ABM2+ simulates population as 25 percent (first iteration), 50 percent (second iteration) and 100 percent (third iteration).
      • RSM will attempt to intelligently sample population and vary it by TAZ with higher sample rate in zones with large changes in accessibility and lower rates in zones with small changes in accessibility.
      • RSM could also have higher sampling in zones around the analysis project and lower elsewhere.
      "},{"location":"development.html#architecture","title":"Architecture","text":"

      The RSM is developed as a Python package and the required modules are launched when running the existing SANDAG travel model as Rapid Model. It takes as input a complete ABM2+ model run and has following modules:

      "},{"location":"development.html#zone-aggregator","title":"Zone Aggregator","text":"

      The RSM zone creator/aggregator creates a set of RSM analysis zones (Rapid Zones) and a set of RSM input files compatible with the zone system, using a donor model run (ABM2+/ABM3) as input. The inputs include the MGRA shapefile (MGRASHAPE.zip), MGRA socioeconomic file (example: mgra13_based_input2016.csv), individual trips (indivTripData_3.csv), from the donor model. It produces a new MGRA socioeconomic file with new RSM zones and crosswalk files between original TAZ/MGRA and the rapid zones. Along with the inputs, the user can specify other parameters such as number of RSM zones, donor model run directory, number of external zones, MGRA socioeconomic file, names of crosswalk files generated by the zone aggregator module, optional study area file (to study localized changes in the region) and RSM zone centroid csv files in the model properties file (sandag_abm.properties).

      At the core of the RSM zone aggregator, the module performs several steps. The MGRA geographies are loaded from shapefiles, MGRA data is loaded from the MGRA socioeconomic file, and trip data is extracted from the individual trip file. Additional computations, like intersection counts and density variables, are performed on the MGRA data. The script aggregates the MGRA\u2019s attributes to create a new zone data based on \u201cTAZ\u201d (Traffic Analysis Zone). The individual trips file is used to calculate the mode shares for each TAZ. Additional travel time between TAZs to the point of interest (default includes San Diego city hall, outside Pendleton gate, Escondido city hall, Viejas casino, and San Ysidro trolley) are also added to the aggregated data by TAZ. The TAZs are further clustered to a user-defined number of RSM zones using several cluster factors (default factors and their weights are as follows: \u201cpopden\u201d: 1, \u201cempden\u201d: 1, \u201cmodeshare_NM\u201d: 100, \u201cmodeshare_WT\u201d: 100) and clustering algorithm. The current scripts support KMeans and agglomerative clustering algorithms to cluster the TAZs. In case the user has specified a study area, the function separately handles them and aggregates them into their clusters based on the specification provided in the study area file. The remaining TAZs are aggregated based on the aggregation algorithm.

      After the clustering, the aggregator produces the TAZ/MGRA crosswalks between old TAZs/MGRAs to new RSM zones. The elementary and high school enrollments are further checked and adjusted in the new RSM zone socioeconomic to prevent zero values.

      The user can also control the execution of the zone aggregator from the properties file. Once a baseline RSM run is established, other project related RSM can be setup to skip running the zone aggregator and the zone system from the RSM baseline can be used.

      "},{"location":"development.html#input-aggregator","title":"Input Aggregator","text":"

      The input aggregator module of RSM aggregates several input files, uec (soa) files, non-abm model outputs of the donor model based on the new RSM zones. The main inputs to this module include the location of the donor model, RSM socioeconomic file, TAZ and MGRA crosswalks. The module reads the original socioeconomic file and adds intersection count and several density variables that were originally generated by the 4D module of the current ABM2+ model. This is done here in RSM because the 4D module is skipped when running RSM. The module then uses the MGRA crosswalks between MGRA and RSM zones to aggregate the original socioeconomic file data based on the new RSM zones to create a new RSM specific socioeconomic file. Next, the module aggregates the following input files:

      • microMgraEquivMinutes.csv
      • microMgraTapEquivMinutes.csv
      • walkMgraTapEquivMinutes.csv
      • walkMgraEquivMinutes.csv
      • bikeTazLogsum.csv
      • bikeMgraLogsum.csv
      • zone.term
      • zones.park
      • tap.ptype
      • accessam.csv
      • ParkLocationAlts.csv
      • CrossBorderDestinationChoiceSoaAlternatives.csv
      • TourDcSoaDistanceAlts.csv
      • DestinationChoiceAlternatives.csv
      • SoaTazDistAlts.csv
      • TripMatrices.csv
      • transponderModelAccessibilities.csv
      • crossBorderTours.csv
      • internalExternalTrips.csv
      • visitorTours.csv
      • visitorTrips.csv
      • householdAVTrips.csv
      • crossBorderTrips.csv
      • TNCTrips.csv
      • airport_out.SAN.csv
      • airport_out.CBX.csv
      • TNCtrips.csv

      Each of the above files has its own aggregation methodology. In some cases, the aggregation is based on mean, in some cases, it\u2019s the total value, or in some cases, it\u2019s the maximum values.

      "},{"location":"development.html#translate-demand","title":"Translate Demand","text":"

      The translate demand module of the RSM aggregates the non-resident demand matrices and trip tables based on the new RSM zone structure. The inputs of this module includes the path to the RSM model directory, donor model directory and crosswalks. In particular the module aggregates the demand from auto, transit, non-motorized, other trips from the airport, cross border, internal external and visitor model. It also aggregated TNC vehicle trips and empty AV trips.

      "},{"location":"development.html#intelligent-sampler","title":"Intelligent Sampler","text":"

      The intelligent sampler module is designed to intelligently sample households and persons from synthetic households and person data, considering accessibility metrics and other parameters. The main inputs to this module are the households file, person file, TAZ/MGRA crosswalks and the outputs are sampled households and person files. In the model properties file (sandag_abm.properties), the user can choose to run RSM sampler, specify the default sampling rate and minimum sampling rate for the RSM model run. The user also has the ability to sample specific zones at 100% by specifying them in the study area file.

      The sampler function follows these primary steps:

      1. Zone Mapping: The function maps zones from the synthetic households/person data to their corresponding RSM zones using crosswalk data.

      2. Household Sampling:

        • If accessibility data is missing (first iteration) or if the RSM sampler is turned off, a default sampling rate is applied to all RSM zones, with optional 100% sampling in the study area.
        • If accessibility data is available and the RSM sampler is turned on, the function calculates differences in accessibility metrics between the current and previous iterations. The sampling rates are determined based on these differences and are adjusted to be within specified bounds. The RSM zones of the study area are sampled at a 100% sampling rate.
      3. Households and Persons Selection: The function selects households based on the calculated sampling rates. It also selects persons associated with the sampled households.

      4. Output:

        • The selected households and persons are written to output CSV files in the specified output directory.
        • The function also computes and logs the total sampling rate, representing the proportion of selected households relative to the total number of households.

      Note that in the current RSM deployment, sampler is set to use 25% default sampling rate. The intelligent sampler needs further testing to be used to sample households using the accessibility change.

      "},{"location":"development.html#intelligent-assembler","title":"Intelligent Assembler","text":"

      The intelligent assembler module assembles the trips of RSM model run and scale them appropriately based on the sampling rate of the RSM zones. The main inputs to this module are joint and individual trips from the donor and RSM model, households file, crosswalks for mapping zones, optional study area file and a flag to running the assembler.

      The assembler function follows these primary steps:

      1. Load Trip Files: The function reads the individual and joint trip data for the RSM run. If the assembler is set to run (flag run_assembler equals 1), the function also loads the corresponding trip data from the donor model run.

      2. Assemble Trips: It converts individual and joint trip data from both the RSM run and the original model run into a common table format using a merging process. It separates trips made by households in the RSM run and those that were not resimulated. Then, it combines these trips to create the final assembled trip data, including individual and joint trips.

      3. Evaluation of Trip Changes: The function calculates and evaluates the percentage change in total trips by mode for each home zone. It aggregates trips made by households in the RSM and original model runs and compares their trip counts by mode. This information is used to assess the stability of travel behavior in different zones.

      4. Alternative Behavior (If Assembler is Off): If the assembler is turned off (flag run_assembler equals 0), the function scales the RSM individual and joint trips based on the specified default sampling rate. This alternative behavior is intended to simulate all trips as if they were selected, eliminating the need for the assembler.

      5. Outputs: The function returns two outputs: individual trips containing the assembled individual trip data, and joint trips containing the assembled joint trip data. These data files are structured to align with the format required for further analysis or use by Java components.

      In summary, the RSM assembler module takes multiple trip datasets and assembles them to create a unified dataset for further analysis, accommodating cases where only a subset of households were resimulated. The function also evaluates changes in trip behavior across different zones.

      "},{"location":"development.html#user-experience","title":"User Experience","text":"

      The RSM repurposes the ABM2+ Emme-based GUI. The options will be updated to reflect the RSM options, as will the input file locations and other parameters. The RSM user experience will, therefore, be nearly the same as the ABM2+ user experience.

      "},{"location":"userguide.html","title":"User Guide","text":""},{"location":"userguide.html#rsm-setup","title":"RSM Setup","text":"

      Below are the steps to setup an RSM scenario run:

      1. Set up an ABM run on the server\u2019s C drive* by using the ABM2+ release 14.2.2 scenario creation GUI located at T:\\ABM\\release\\ABM\\version_14_2_2\\dist\\createStudyAndScenario.exe.

        *running the model on the T drive and setting it to run on the local drive causes an error. An issue has been created on GitHub

      2. Open Anaconda Prompt and type the following command:

        python T:\\projects\\RSM\\setup\\setup_rsm.py [MODEL_RUN_DIRECTORY]

        Specifying the model run directory in the command line is optional. If it is not specified a dialog box will open asking the user to specify the model run.

      3. Change the inputs and properties as needed. Be sure to check the following:

        1. If running a new network, make sure the network files are correct
        2. Check that the RSM properties were appended to the property file and make sure the RSM properties are correct
        3. Check that the updated Tour Mode Choice UEC was copied over
      4. After opening Emme using start_emme_with_virtual_env.bat and opening the SANDAG toolbox in Modeller as usual, set the steps to skip all of the special market models and to run only 2 iterations. Most of these should be set automatically, though you may need to set it to skip the EE model manually.

        Figure 1: Steps to run in SANDAG model GUI for RSM run

      "},{"location":"userguide.html#debugging","title":"Debugging","text":"

      For crashes encountered in CT-RAMP, review the event log as usual. However, if it occurs during an RSM step, a new logfile called rsm-logging.log is created in the LogFiles folder.

      "},{"location":"userguide.html#rsm-specific-changes","title":"RSM Specific Changes","text":""},{"location":"userguide.html#application","title":"Application","text":"
      • sandag_abm.jar
        • New CT-RAMP jar file with few required Java code updates.
      "},{"location":"userguide.html#bin","title":"Bin","text":"
      • runRSMAccessibility.cmd
        • Runs CT-RAMP to compute the accessibility of each zone
      • runRSMAssembler.cmd
        • Runs the intelligent assembler
      • runRSMEmmebankMatrixAggregator.cmd
        • Opens the Emmebank of the donor model and aggregates the truck and external trip tables
      • runRSMInputAggregator.cmd
        • Aggregates various model inputs into the aggregated zone system creating
      • runRSMSampler.cmd
        • Runs the intelligent sampler that combines the donor model trip diaries with the travel behavior of the resampled households
      • runRSMSandagABM.cmd
        • Runs CT-RAMP on the sampled households
      • runRSMSandagABMTripTables.cmd
        • Builds trip tables from assembled trip data
      • runRSMSetProperty.cmd
        • Updates property file to read the accessibility file instead of building it
      • runRSMSetupUpdate.cmd
        • Updates several properties
      • runRSMTripMatrixAggregator.cm
        • Aggregates trip tables stored in OMX files from donor model
      • runRSMZoneAggregator.cmd
        • Runs the zone aggregator
      "},{"location":"userguide.html#emme_project","title":"Emme_project","text":"
      • start_emme_with_virtualenv.bat
        • New lines to call Python environments used in RSM scripts
      • scripts\\sandag_toolbox.mtbx
        • Updated toolbox with a master run script to call RSM steps
      "},{"location":"userguide.html#input","title":"Input","text":"
      • MGRASHAPE.zip
        • Zipped shapefile of the MGRAs (used in zone aggregator)
      "},{"location":"userguide.html#pythonemmetoolbox","title":"Python\\emme\\toolbox","text":"
      • master_run.py
        • Changed to include new model steps
      • import\\import_auto_demand.py
        • Changes to how the trip tables are read into the Emmebank
      • utilities\\databank_aggregator.py
        • Aggregates matrices stored in the Emmebank
      "},{"location":"userguide.html#new-properties","title":"New Properties","text":"
      • run.rsm.setup
        • Set to 1 if running the RSM setup steps and 0 otherwise
          • Zone aggregator
          • Input aggregator
          • Matrix aggregator
          • Emmebank aggregator
      • run.rsm
        • Set to 1 if running the RSM and 0 otherwise
      • run.rsm.zone.aggregator
        • If set to 1, the zone aggregator will be run. If set to 0, the zone system from a run specified in rsm.baseline.run.dir will be used.
      • rsm.baseline.run.dir
        • Baseline run to read in zone system from if not running zone aggregator
      • rsm.zones
        • Number of zones to use
      • External.zones
        • Number of external zones
      • Run.rsm.sampling
        • 1 if running the intelligent sampler and 0 if not. If set to 0, every zone will have the default sampling rate
      • Rsm.default.sampling.rate
        • Default sampling rate to use when running the intelligent sampler
      • Rsm.centroid.connector.start.id
        • Starting value of tcovid for new zonal connectors to aggregated zones
      • Full.modelrun.dir
        • Filepath of donor model
      • Taz.to.cluster.crosswalk.file
        • Maps TAZs to aggregated zones
      • Mgra.to.cluster.crosswalk.file
        • Maps MGRAs to aggregated zones
      • Cluster.zone.centroid.file
        • Latitude and longitude coordinates of aggregated zone centroids
      "},{"location":"userguide.html#new-files","title":"New Files","text":"
      1. study_area.csv:

        This optional file specifies an explicit definition of how to aggregate certain zones, and consequentially, which zones to not aggregate. This is useful for project-level analysis as a modeler may want higher resolution close to a project but not be need the resolution further away. The file has two columns, taz and group. The taz column is the zone ID in the ABM zone system, and the group column indicates what RSM zone the ABM zone will be a part of. This will be the MGRA ID, and the TAZ ID being the MGRA ID added to the number of external zones. If a user doesn\u2019t want to aggregate any zones within the study area, the group ID should be distinct for all of them. Presently, all RSM zones defined in the study area are sampled at 100%, and the remaining zones are sampled at the sampling rate set in the property file.

        Any zones not within the study area will be aggregated using the standard RSM zone aggregating algorithm.

        An example of how the study area file works is shown below (assuming 12 external zones):

        Figure 2: ABM Zones

        Table 1: study_area.csv

        taz group 1 1 2 2 3 3 4 4 5 5 6 6

        Figure 3: Resulting RSM Zones

        For a practical example, see Figure 4, where a study area was defined as every zone within a half mile of a project. Note that within the study area, no zones were aggregated (as it was defined), but outside of the study area, aggregation occurred.

        Figure 4: Example Study Area

      "},{"location":"visualizer.html","title":"Visualizer","text":""},{"location":"visualizer.html#introduction","title":"Introduction","text":"

      The team developed a RSM visualizer tool to allow user to summarize and compare metrics from multiple RSM model runs. It is a dashboard style tool built using SimWrapper (an open source web-based data visualization tool for building disaggregate transportation simulations) and also leverages SANDAG\u2019s Data Pipeline Tool. SimWrapper software works by creating a mini file server to host reduced data summaries of travel model. The dashboard is created via YAML files, which can be customized to automate interactive report summaries, such as charts, summary tables, and spatial maps.

      "},{"location":"visualizer.html#design","title":"Design","text":"

      Visualizer has three main components:

      • Data Pipeline
      • Post Processing
      • SimWrapper Dashboard
      "},{"location":"visualizer.html#data-pipeline","title":"Data Pipeline","text":"

      SANDAG Data Pipeline Tool aims to aid in the process of building data pipelines that ingest, transform, and summarize data by taking advantage of the parameterization of data pipelines. Rather than coding from scratch, configure a few files and the tool will figure out the rest. Using pipeline helps to get the desired model summaries in a csv format. See here to learn how the tool works. Note that RSM visualizer currently supports a fixed set of summaries from the model and additional summaries can be easily incorporated into the pipeline by modifying the settings, processor and expression files.

      "},{"location":"visualizer.html#post-processing","title":"Post Processing","text":"

      Next, there is a post-processing script to perform all the data manipulations which are done outside of the data pipeline tool to prepare the data in the format required by SimWrapper. Similar to data pipeline, user can also modify this post-processing script to add any new summaries in order to bring them into the SimWrapper dashboard in order to use them in Simwrapper.

      "},{"location":"visualizer.html#simwrapper","title":"SimWrapper","text":"

      Lastly, the created summary files are consumed by SimWrapper to generate dashboard. SimWrapper is a web platform that can display either individual full-page data visualizations, or collections of visualizations in \u201cdashboard\u201d format. It expects your simulation outputs to just be local files on your filesystem somewhere; there is no need to upload the summary files to centralized database or cloud server to create the dashboard.

      For setting up the visualization in SimWrapper, configuration files (in YAML format) are created that provide all the config details to get it up and running, such as which data to load, how to lay out the dashboard, what type of chart to create etc. Refer to SimWrapper documentation here to get more familiar with it.

      "},{"location":"visualizer.html#setup","title":"Setup","text":"

      The visualizer is currently deployed to compare 3 scenario runs at once. Running data pipeline and post-processing for each of those scenario is controlled thorugh the process_scenarios python script and configuration for scenarios are specified using the scenarios.yaml file. User will need to modify this yaml file to specify the scenarios they would like to compare using visualizer. There are two categories of scenarios to be specified - RSM and ABM (Donor Model) runs. For each of the scenario run, specify the directory of input and report folders in this configuration file. Files from input and report folder for the scenarios are then used in the data pipeline tool and post-processing step to create summaries in the processed folder of SimWrapper directory. Note that additional number of scenarios can be compared by extending the configuration in this file yaml file.

      "},{"location":"visualizer.html#visualization","title":"Visualization","text":"

      Currently there are five default visualization summaries in the visualizer:

      "},{"location":"visualizer.html#bar-charts","title":"Bar Charts","text":"

      These charts are for comparing VMT, mode shares, transit boardings and trip purpose by time-of-day distribution. Here is a snapshot of sample YAML configuration file for bar chart:

      User can add as many charts as you want to the layout. For each chart, you should specify a csv file for the summaries and columns should match the csv file column name. There are also other specifications for the bar charts which you learn more about here.

      Here is how the how the visual looks in the dashboard:

      "},{"location":"visualizer.html#network-flows","title":"Network Flows","text":"

      These charts are for comparing flows and VMT on the network. You can compare any two scenarios on one network. Here is a snapshot of the configuration file:

      For each network you need the csv files for two scenario summaries and an underlying network file which should be in geojson format. The supporting script creates the geojson files from the model outputs for the SimWrapper. For more info on network visualization specification see here.

      Here is how the how the visual looks in the dashboard:

      "},{"location":"visualizer.html#sample-rate-map","title":"Sample Rate Map","text":"

      This visual is a map for showing the RSM sample rates for each zone. Here is a snapshot of the configuration [file]:

      For each map you need a csv file of sample rates and the map of zones in .shp format. For more info on network visualization specification see here.

      Here is how the how the visual looks in the dashboard:

      "},{"location":"visualizer.html#zero-car-map","title":"Zero Car Map","text":"

      This visual is a map for showing the zero-car household distribution. Here is a snapshot of the configuration file:

      For each map you need a csv file of household rates and the map of zones in .shp format. For more info on network visualization specification see here

      Here is how the how the visual looks in the dashboard:

      "},{"location":"visualizer.html#od-flows","title":"OD Flows","text":"

      This chart is for showing OD trip flows. Here is a snapshot of the configuration file:

      For each map you need a csv file of od trip flows and the map of zones in .shp format. For more info on network visualization specification see here

      Here is how the how the visual looks in the dashboard:

      You can also modify the data and configuration of each visual on SimWrapper server. For each visual, there is a configuration button (see below), where you can add data, and modify all the map configurations. You can also export these configurations into a YAML file so you can use it in future.

      "},{"location":"visualizer.html#how-to-run","title":"How to Run","text":"

      The first step to run the visualizer is to bring in the scenario files. Currently the visualizer is setup to compare three scenarios: donor_ model, rsm_base and rsm_scen. donor_model is the ABM run, rsm_base is the baseline (no-action) RSM run and rsm_scen is the project (action) RSM run.

      • For each of the three scenarios, copy report folder from their respective scenario run to \u201cvisualizer/simwrapper/data/external/[scenario_name]/reports\u201d folder. For instance, for donor_model copy the report folder here.

      • Only for the RSM scenarios, copy mgra_crosswalk.csv and households.csv files from the scenario input folder and bring them to the input folder \u201cvisualizer/simwrapper/data/external/[scenario_name]/input\u201d. Next, change the name of the \u201chouseholds.csv\u201d to \u201chouseholds_orig.csv\u201d. At this point the input folder for RSM scenarios in the simwrapper folder should look like below:

      As mentioned earlier, if you wish to add any more RSM scenarios for comaprison, you can do it by modifying the scenarios.yaml file. Simply add the scenario configuration by copying the rsm_scen section and paste it under and change \u201crsm_scen\u201d to that new scenario name. Note that you will also need to add that another scenario config to the Data Pipeline and Post-Processing step.

      Once you have copied required scenario files and the configuration setup, you are ready to runt the visualizer.

      • Open Anaconda prompt and change the directory to visualizer folder in your local RSM repository.

      • Run the process scenario script by typing command below and then press enter.

        python process_scenarios.py

      • Processing all the scenario using pipeline will take some time.

      • Once this script is run successfully, it creates the summary files for each scenario to feed into simwrapper.

      • Finally, open this link in the web browser - https://simwrapper.github.io/site/

      • Click on \u2018Enter Site\u2019 button, then click on \u2018add local folder\u2019 and add simwrapper directory (visualizer\\simwrapper) to run the SimWrapper Visualizer for RSM.

      "}]} \ No newline at end of file +{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"index.html","title":"SANDAG Rapid Strategic Model","text":"

      Welcome to the SANDAG Rapid Strategic Model documentation site!

      "},{"location":"index.html#introduction","title":"Introduction","text":"

      The travel demand model SANDAG used for the 2021 regional plan, referred to as ABM2+, is one of the most sophisticated modeling tools used anywhere in the world. Its activity-based approach to representing travel is behaviorally rich; the representations of land development and transportation infrastructure are represented in high fidelity spatial detail. An operational shortcoming of ABM2+ is it requires significant computational resources to carry out a simulation. A typical forecast year simulation of ABM2+ takes over 40 hours to complete on a high end workstation (e.g., 48 physical computing cores and 256 gigabytes of RAM). The components of this runtime include:

      • Three iterations of the resident activity-based model, each about 6 hours
      • Four iterations of roadway and transit assignment, with each iteration taking about 90 minutes

      The computational time of ABM2+, and the likely computational time of the successor to ABM2+ (ABM3), hinders SANDAG\u2019s ability to carry out certain analyses in a timely manner. For example, if an analyst wants to explore 10 different roadway pricing schemes for a select corridor, a month of computation time would be required.

      SANDAG requires a tool capable of quickly approximating the outcomes of ABM2+. Therefore, a tool was built for this purpose, referred to henceforth as the Rapid Strategic Model (RSM). The primary objective of the RSM was to enhance the speed of the resident passenger component within the broader modeling system and produce results that closely aligned with ABM2+ for policy planning requirements.

      "},{"location":"index.html#use-cases-and-key-limitations","title":"Use Cases and Key Limitations","text":"

      Based on set of tests done as part of this project, RSM performs well for regional scale roadway projects (e.g., auto operating costs and mileage fee, TNC costs and wait times etc.) and regional scale transit projects (transit fare, headway changes etc.). RSM also performed well for land-use change policies. Lastly, RSM was also tested for local roadway changes (e.g., managed lanes conversion) and local transit changes (e.g., new BRT line), and the results indicate that those policies are reasonably represented by RSM as well.

      Here are some of the current limitations of RSM:

      • The scope of the RSM is \u201cpassenger\u201d travel. Policies and/or infrastructure that primarily impact commercial travel (e.g., truck lanes) will not be well represented.
      • Minor re-calibration of the mode choice was necessary to match observed walk trips. Large changes to the number of zones will likely require recalibration.
      • The spatial aggregation reduces the RSM\u2019s ability to represent to simulate infrastructure and/or policies that act at small scales (e.g., pedestrian infrastructure).
      • Policies related to the adoption of automated vehicles cannot be currently represented. RSM currently skips running the Household AV Allocation module.
      • While the RSM has been tested, the testing has not been extensive. More extensive testing is likely to surface additional issues. Additional testing will be required to evaluate if RSM can be a viable tool for other policies that interests SANDAG.
      "},{"location":"api.html","title":"Application Programming Interface","text":""},{"location":"api.html#rsm.zone_agg.aggregate_zones","title":"aggregate_zones(mgra_gdf, method='kmeans', n_zones=2000, random_state=0, cluster_factors=None, cluster_factors_onehot=None, use_xy=True, explicit_agg=(), explicit_col='mgra', agg_instruction=None, start_cluster_ids=13)","text":"

      Aggregate zones.

      "},{"location":"api.html#rsm.zone_agg.aggregate_zones--parameters","title":"Parameters","text":"

      mgra_gdf : mgra_gdf (GeoDataFrame) Geometry and attibutes of MGRAs method : method (array) default {\u2018kmeans\u2019, \u2018agglom\u2019, \u2018agglom_adj\u2019} n_zones : n_zones (int) random_state : random_state (RandomState or int) cluster_factors : cluster_factors (dict) cluster_factors_onehot : cluster_factors_onehot (dict) use_xy : use_xy (bool or float) Use X and Y coordinates as a cluster factor, use a float to scale the x-y coordinates from the CRS if needed. explicit_agg : explicit_agg (list[int or list]) A list containing integers (individual MGRAs that should not be aggregated) or lists of integers (groups of MGRAs that should be aggregated exactly as given, with no less and no more) explicit_col : explicit_col (str) The name of the column containing the ID\u2019s from explicit_agg, usually \u2018mgra\u2019 or \u2018taz\u2019 agg_instruction : agg_instruction (dict) Dictionary passed to pandas agg that says how to aggregate data columns. start_cluster_ids : start_cluster_ids (int, default 13) Cluster id\u2019s start at this value. Can be 1, but typically SANDAG has the smallest id\u2019s reserved for external zones, so starting at a greater value is typical.

      "},{"location":"api.html#rsm.zone_agg.aggregate_zones--returns","title":"Returns","text":"

      GeoDataFrame

      Source code in rsm/zone_agg.py
      def aggregate_zones(\n    mgra_gdf,\n    method=\"kmeans\",\n    n_zones=2000,\n    random_state=0,\n    cluster_factors=None,\n    cluster_factors_onehot=None,\n    use_xy=True,\n    explicit_agg=(),\n    explicit_col=\"mgra\",\n    agg_instruction=None,\n    start_cluster_ids=13,\n):\n\"\"\"\n    Aggregate zones.\n\n    Parameters\n    ----------\n    mgra_gdf : mgra_gdf (GeoDataFrame)\n        Geometry and attibutes of MGRAs\n    method : method (array)\n        default {'kmeans', 'agglom', 'agglom_adj'}\n    n_zones : n_zones (int)\n    random_state : random_state (RandomState or int)\n    cluster_factors : cluster_factors (dict)\n    cluster_factors_onehot : cluster_factors_onehot (dict)\n    use_xy : use_xy (bool or float)\n        Use X and Y coordinates as a cluster factor, use a float to scale the\n        x-y coordinates from the CRS if needed.\n    explicit_agg : explicit_agg (list[int or list])\n        A list containing integers (individual MGRAs that should not be aggregated)\n        or lists of integers (groups of MGRAs that should be aggregated exactly as\n        given, with no less and no more)\n    explicit_col : explicit_col (str)\n        The name of the column containing the ID's from `explicit_agg`, usually\n        'mgra' or 'taz'\n    agg_instruction : agg_instruction (dict)\n        Dictionary passed to pandas `agg` that says how to aggregate data columns.\n    start_cluster_ids : start_cluster_ids (int, default 13)\n        Cluster id's start at this value.  Can be 1, but typically SANDAG has the\n        smallest id's reserved for external zones, so starting at a greater value\n        is typical.\n\n    Returns\n    -------\n    GeoDataFrame\n    \"\"\"\n\n    if cluster_factors is None:\n        cluster_factors = {}\n\n    n = start_cluster_ids\n    if explicit_agg:\n        explicit_agg_ids = {}\n        for i in explicit_agg:\n            if isinstance(i, Number):\n                explicit_agg_ids[i] = n\n            else:\n                for j in i:\n                    explicit_agg_ids[j] = n\n            n += 1\n        if explicit_col == mgra_gdf.index.name:\n            mgra_gdf = mgra_gdf.reset_index()\n            mgra_gdf.index = mgra_gdf[explicit_col]\n        in_explicit = mgra_gdf[explicit_col].isin(explicit_agg_ids)\n        mgra_gdf_algo = mgra_gdf.loc[~in_explicit].copy()\n        mgra_gdf_explicit = mgra_gdf.loc[in_explicit].copy()\n        mgra_gdf_explicit[\"cluster_id\"] = mgra_gdf_explicit[explicit_col].map(\n            explicit_agg_ids\n        )\n        n_zones_algorithm = n_zones - len(\n            mgra_gdf_explicit[\"cluster_id\"].value_counts()\n        )\n    else:\n        mgra_gdf_algo = mgra_gdf.copy()\n        mgra_gdf_explicit = None\n        n_zones_algorithm = n_zones\n\n    if use_xy:\n        geometry = mgra_gdf_algo.centroid\n        X = list(geometry.apply(lambda p: p.x))\n        Y = list(geometry.apply(lambda p: p.y))\n        factors = [np.asarray(X) * use_xy, np.asarray(Y) * use_xy]\n    else:\n        factors = []\n    for cf, cf_wgt in cluster_factors.items():\n        factors.append(cf_wgt * mgra_gdf_algo[cf].values.astype(np.float32))\n    if cluster_factors_onehot:\n        for cf, cf_wgt in cluster_factors_onehot.items():\n            factors.append(cf_wgt * OneHotEncoder().fit_transform(mgra_gdf_algo[[cf]]))\n        from scipy.sparse import hstack\n\n        factors2d = []\n        for j in factors:\n            if j.ndim < 2:\n                factors2d.append(np.expand_dims(j, -1))\n            else:\n                factors2d.append(j)\n        data = hstack(factors2d).toarray()\n    else:\n        data = np.array(factors).T\n\n    if method == \"kmeans\":\n        kmeans = KMeans(n_clusters=n_zones_algorithm, random_state=random_state)\n        kmeans.fit(data)\n        cluster_id = kmeans.labels_\n    elif method == \"agglom\":\n        agglom = AgglomerativeClustering(\n            n_clusters=n_zones_algorithm, affinity=\"euclidean\", linkage=\"ward\"\n        )\n        agglom.fit_predict(data)\n        cluster_id = agglom.labels_\n    elif method == \"agglom_adj\":\n        from libpysal.weights import Rook\n\n        w_rook = Rook.from_dataframe(mgra_gdf_algo)\n        adj_mat = nx.adjacency_matrix(w_rook.to_networkx())\n        agglom = AgglomerativeClustering(\n            n_clusters=n_zones_algorithm,\n            affinity=\"euclidean\",\n            linkage=\"ward\",\n            connectivity=adj_mat,\n        )\n        agglom.fit_predict(data)\n        cluster_id = agglom.labels_\n    else:\n        raise NotImplementedError(method)\n    mgra_gdf_algo[\"cluster_id\"] = cluster_id\n\n    if mgra_gdf_explicit is None or len(mgra_gdf_explicit) == 0:\n        combined = merge_zone_data(\n            mgra_gdf_algo,\n            agg_instruction,\n            cluster_id=\"cluster_id\",\n        )\n        combined[\"cluster_id\"] = list(range(n, n + n_zones_algorithm))\n    else:\n        pending = []\n        for df in [mgra_gdf_algo, mgra_gdf_explicit]:\n            logger.info(f\"... merging {len(df)}\")\n            pending.append(\n                merge_zone_data(\n                    df,\n                    agg_instruction,\n                    cluster_id=\"cluster_id\",\n                ).reset_index()\n            )\n\n        pending[0][\"cluster_id\"] = list(range(n, n + n_zones_algorithm))\n\n        pending[0] = pending[0][\n            [c for c in pending[1].columns if c in pending[0].columns]\n        ]\n        pending[1] = pending[1][\n            [c for c in pending[0].columns if c in pending[1].columns]\n        ]\n        combined = pd.concat(pending, ignore_index=False)\n    combined = combined.reset_index(drop=True)\n\n    return combined\n
      "},{"location":"api.html#rsm.input_agg.agg_input_files","title":"agg_input_files(model_dir='.', rsm_dir='.', taz_cwk_file='taz_crosswalk.csv', mgra_cwk_file='mgra_crosswalk.csv', agg_zones=2000, ext_zones=12, input_files=['microMgraEquivMinutes.csv', 'microMgraTapEquivMinutes.csv', 'walkMgraTapEquivMinutes.csv', 'walkMgraEquivMinutes.csv', 'bikeTazLogsum.csv', 'bikeMgraLogsum.csv', 'zone.term', 'zones.park', 'tap.ptype', 'accessam.csv', 'ParkLocationAlts.csv', 'CrossBorderDestinationChoiceSoaAlternatives.csv', 'TourDcSoaDistanceAlts.csv', 'DestinationChoiceAlternatives.csv', 'SoaTazDistAlts.csv', 'TripMatrices.csv', 'transponderModelAccessibilities.csv', 'crossBorderTours.csv', 'internalExternalTrips.csv', 'visitorTours.csv', 'visitorTrips.csv', 'householdAVTrips.csv', 'crossBorderTrips.csv', 'TNCTrips.csv', 'airport_out.SAN.csv', 'airport_out.CBX.csv', 'TNCtrips.csv'])","text":""},{"location":"api.html#rsm.input_agg.agg_input_files--parameters","title":"Parameters","text":"

      model_dir : model_dir (path_like) path to full model run, default \u201c.\u201d rsm_dir : rsm_dir (path_like) path to RSM, default \u201c.\u201d taz_cwk_file : taz_cwk_file (csv file) default taz_crosswalk.csv taz to aggregated zones file. Should be located in RSM input folder mgra_cwk_file : mgra_cwk_file (csv file) default mgra_crosswalk.csv mgra to aggregated zones file. Should be located in RSM input folder input_files : input_files (csv + other files) list of input files to be aggregated. Should include the following files \u201cmicroMgraEquivMinutes.csv\u201d, \u201cmicroMgraTapEquivMinutes.csv\u201d, \u201cwalkMgraTapEquivMinutes.csv\u201d, \u201cwalkMgraEquivMinutes.csv\u201d, \u201cbikeTazLogsum.csv\u201d, \u201cbikeMgraLogsum.csv\u201d, \u201czone.term\u201d, \u201czones.park\u201d, \u201ctap.ptype\u201d, \u201caccessam.csv\u201d, \u201cParkLocationAlts.csv\u201d, \u201cCrossBorderDestinationChoiceSoaAlternatives.csv\u201d, \u201cTourDcSoaDistanceAlts.csv\u201d, \u201cDestinationChoiceAlternatives.csv\u201d, \u201cSoaTazDistAlts.csv\u201d, \u201cTripMatrices.csv\u201d, \u201ctransponderModelAccessibilities.csv\u201d, \u201ccrossBorderTours.csv\u201d, \u201cinternalExternalTrips.csv\u201d, \u201cvisitorTours.csv\u201d, \u201cvisitorTrips.csv\u201d, \u201chouseholdAVTrips.csv\u201d, \u201ccrossBorderTrips.csv\u201d, \u201cTNCTrips.csv\u201d, \u201cairport_out.SAN.csv\u201d, \u201cairport_out.CBX.csv\u201d, \u201cTNCtrips.csv\u201d

      "},{"location":"api.html#rsm.input_agg.agg_input_files--returns","title":"Returns","text":"

      Aggregated files in the RSM input/output/uec directory

      Source code in rsm/input_agg.py
      def agg_input_files(\n    model_dir = \".\", \n    rsm_dir = \".\",\n    taz_cwk_file = \"taz_crosswalk.csv\",\n    mgra_cwk_file = \"mgra_crosswalk.csv\",\n    agg_zones=2000,\n    ext_zones=12,\n    input_files = [\"microMgraEquivMinutes.csv\", \"microMgraTapEquivMinutes.csv\", \n    \"walkMgraTapEquivMinutes.csv\", \"walkMgraEquivMinutes.csv\", \"bikeTazLogsum.csv\",\n    \"bikeMgraLogsum.csv\", \"zone.term\", \"zones.park\", \"tap.ptype\", \"accessam.csv\",\n    \"ParkLocationAlts.csv\", \"CrossBorderDestinationChoiceSoaAlternatives.csv\", \n    \"TourDcSoaDistanceAlts.csv\", \"DestinationChoiceAlternatives.csv\", \"SoaTazDistAlts.csv\",\n    \"TripMatrices.csv\", \"transponderModelAccessibilities.csv\", \"crossBorderTours.csv\", \n    \"internalExternalTrips.csv\", \"visitorTours.csv\", \"visitorTrips.csv\", \"householdAVTrips.csv\", \n    \"crossBorderTrips.csv\", \"TNCTrips.csv\", \"airport_out.SAN.csv\", \"airport_out.CBX.csv\", \n    \"TNCtrips.csv\"]\n    ):\n\n\"\"\"\n        Parameters\n        ----------\n        model_dir : model_dir (path_like)\n            path to full model run, default \".\"\n        rsm_dir : rsm_dir (path_like)\n            path to RSM, default \".\"\n        taz_cwk_file : taz_cwk_file (csv file)\n            default taz_crosswalk.csv\n            taz to aggregated zones file. Should be located in RSM input folder\n        mgra_cwk_file : mgra_cwk_file (csv file)\n            default mgra_crosswalk.csv\n            mgra to aggregated zones file. Should be located in RSM input folder\n        input_files : input_files (csv + other files)\n            list of input files to be aggregated. \n            Should include the following files\n                \"microMgraEquivMinutes.csv\", \"microMgraTapEquivMinutes.csv\", \n                \"walkMgraTapEquivMinutes.csv\", \"walkMgraEquivMinutes.csv\", \"bikeTazLogsum.csv\",\n                \"bikeMgraLogsum.csv\", \"zone.term\", \"zones.park\", \"tap.ptype\", \"accessam.csv\",\n                \"ParkLocationAlts.csv\", \"CrossBorderDestinationChoiceSoaAlternatives.csv\",\n                \"TourDcSoaDistanceAlts.csv\", \"DestinationChoiceAlternatives.csv\", \"SoaTazDistAlts.csv\",\n                \"TripMatrices.csv\", \"transponderModelAccessibilities.csv\", \"crossBorderTours.csv\",\n                \"internalExternalTrips.csv\", \"visitorTours.csv\", \"visitorTrips.csv\", \"householdAVTrips.csv\",\n                \"crossBorderTrips.csv\", \"TNCTrips.csv\", \"airport_out.SAN.csv\", \"airport_out.CBX.csv\",\n                \"TNCtrips.csv\"\n\n        Returns\n        -------\n        Aggregated files in the RSM input/output/uec directory\n    \"\"\"\n\n    df_clusters = pd.read_csv(os.path.join(rsm_dir, \"input\", taz_cwk_file))\n    df_clusters.columns= df_clusters.columns.str.strip().str.lower()\n    dict_clusters = dict(zip(df_clusters['taz'], df_clusters['cluster_id']))\n\n    mgra_cwk = pd.read_csv(os.path.join(rsm_dir, \"input\", mgra_cwk_file))\n    mgra_cwk.columns= mgra_cwk.columns.str.strip().str.lower()\n    mgra_cwk = dict(zip(mgra_cwk['mgra'], mgra_cwk['cluster_id']))\n\n    taz_zones = int(agg_zones) + int(ext_zones)\n    mgra_zones = int(agg_zones)\n\n    # aggregating microMgraEquivMinutes.csv\n    if \"microMgraEquivMinutes.csv\" in input_files:\n        logging.info(\"Aggregating - microMgraEquivMinutes.csv\")\n        df_mm_eqmin = pd.read_csv(os.path.join(model_dir, \"output\", \"microMgraEquivMinutes.csv\"))\n        df_mm_eqmin['i_new'] = df_mm_eqmin['i'].map(mgra_cwk)\n        df_mm_eqmin['j_new'] = df_mm_eqmin['j'].map(mgra_cwk)\n\n        df_mm_eqmin_agg = df_mm_eqmin.groupby(['i_new', 'j_new'])['walkTime', 'dist', 'mmTime', 'mmCost', 'mtTime', 'mtCost',\n       'mmGenTime', 'mtGenTime', 'minTime'].mean().reset_index()\n\n        df_mm_eqmin_agg = df_mm_eqmin_agg.rename(columns = {'i_new' : 'i', 'j_new' : 'j'})\n        df_mm_eqmin_agg.to_csv(os.path.join(rsm_dir, \"input\", \"microMgraEquivMinutes.csv\"), index = False)\n\n    else:\n        raise FileNotFoundError(\"microMgraEquivMinutes.csv\")\n\n\n    # aggregating microMgraTapEquivMinutes.csv\"   \n    if \"microMgraTapEquivMinutes.csv\" in input_files:\n        logging.info(\"Aggregating - microMgraTapEquivMinutes.csv\")\n        df_mm_tap = pd.read_csv(os.path.join(model_dir, \"output\", \"microMgraTapEquivMinutes.csv\"))\n        df_mm_tap['mgra'] = df_mm_tap['mgra'].map(mgra_cwk)\n\n        df_mm_tap_agg = df_mm_tap.groupby(['mgra', 'tap'])['walkTime', 'dist', 'mmTime', 'mmCost', 'mtTime',\n       'mtCost', 'mmGenTime', 'mtGenTime', 'minTime'].mean().reset_index()\n\n        df_mm_tap_agg.to_csv(os.path.join(rsm_dir, \"input\", \"microMgraTapEquivMinutes.csv\"), index = False)\n\n    else:\n        raise FileNotFoundError(\"microMgraTapEquivMinutes.csv\")\n\n    # aggregating walkMgraTapEquivMinutes.csv\n    if \"walkMgraTapEquivMinutes.csv\" in input_files:\n        logging.info(\"Aggregating - walkMgraTapEquivMinutes.csv\")\n        df_wlk_mgra_tap = pd.read_csv(os.path.join(model_dir, \"output\", \"walkMgraTapEquivMinutes.csv\"))\n        df_wlk_mgra_tap[\"mgra\"] = df_wlk_mgra_tap[\"mgra\"].map(mgra_cwk)\n\n        df_wlk_mgra_agg = df_wlk_mgra_tap.groupby([\"mgra\", \"tap\"])[\"boardingPerceived\", \"boardingActual\",\"alightingPerceived\",\"alightingActual\",\"boardingGain\",\"alightingGain\"].mean().reset_index()\n        df_wlk_mgra_agg.to_csv(os.path.join(rsm_dir, \"input\", \"walkMgraTapEquivMinutes.csv\"), index = False)\n\n    else:\n        FileNotFoundError(\"walkMgraTapEquivMinutes.csv\")\n\n    # aggregating walkMgraEquivMinutes.csv\n    if \"walkMgraEquivMinutes.csv\" in input_files:\n        logging.info(\"Aggregating - walkMgraEquivMinutes.csv\")\n        df_wlk_min = pd.read_csv(os.path.join(model_dir, \"output\", \"walkMgraEquivMinutes.csv\"))\n        df_wlk_min[\"i\"] = df_wlk_min[\"i\"].map(mgra_cwk)\n        df_wlk_min[\"j\"] = df_wlk_min[\"j\"].map(mgra_cwk)\n\n        df_wlk_min_agg = df_wlk_min.groupby([\"i\", \"j\"])[\"percieved\",\"actual\", \"gain\"].mean().reset_index()\n\n        df_wlk_min_agg.to_csv(os.path.join(rsm_dir, \"input\", \"walkMgraEquivMinutes.csv\"), index = False)\n\n    else:\n        FileNotFoundError(\"walkMgraEquivMinutes.csv\")\n\n    # aggregating biketazlogsum\n    if \"bikeTazLogsum.csv\" in input_files:\n        logging.info(\"Aggregating - bikeTazLogsum.csv\")\n        bike_taz = pd.read_csv(os.path.join(model_dir, \"output\", \"bikeTazLogsum.csv\"))\n\n        bike_taz[\"i\"] = bike_taz[\"i\"].map(dict_clusters)\n        bike_taz[\"j\"] = bike_taz[\"j\"].map(dict_clusters)\n\n        bike_taz_agg = bike_taz.groupby([\"i\", \"j\"])[\"logsum\", \"time\"].mean().reset_index()\n        bike_taz_agg.to_csv(os.path.join(rsm_dir, \"input\", \"bikeTazLogsum.csv\"), index = False)\n\n    else:\n        raise FileNotFoundError(\"bikeTazLogsum.csv\")\n\n    # aggregating bikeMgraLogsum.csv\n    if \"bikeMgraLogsum.csv\" in input_files:\n        logging.info(\"Aggregating - bikeMgraLogsum.csv\")\n        bike_mgra = pd.read_csv(os.path.join(model_dir, \"output\", \"bikeMgraLogsum.csv\"))\n        bike_mgra[\"i\"] = bike_mgra[\"i\"].map(mgra_cwk)\n        bike_mgra[\"j\"] = bike_mgra[\"j\"].map(mgra_cwk)\n\n        bike_mgra_agg = bike_mgra.groupby([\"i\", \"j\"])[\"logsum\", \"time\"].mean().reset_index()\n        bike_mgra_agg.to_csv(os.path.join(rsm_dir, \"input\", \"bikeMgraLogsum.csv\"), index = False)\n    else:\n        raise FileNotFoundError(\"bikeMgraLogsum.csv\")\n\n    # aggregating zone.term\n    if \"zone.term\" in input_files:\n        logging.info(\"Aggregating - zone.term\")\n        df_zone_term = pd.read_fwf(os.path.join(model_dir, \"input\", \"zone.term\"), header = None)\n        df_zone_term.columns = [\"taz\", \"terminal_time\"]\n\n        df_agg = pd.merge(df_zone_term, df_clusters, on = \"taz\", how = 'left')\n        df_zones_agg = df_agg.groupby([\"cluster_id\"])['terminal_time'].max().reset_index()\n\n        df_zones_agg.columns = [\"taz\", \"terminal_time\"]\n        df_zones_agg.to_fwf(os.path.join(rsm_dir, \"input\", \"zone.term\"))\n\n    else:\n        raise FileNotFoundError(\"zone.term\")\n\n    # aggregating zones.park\n    if \"zones.park\" in input_files:\n        logging.info(\"Aggregating - zone.park\")\n        df_zones_park = pd.read_fwf(os.path.join(model_dir, \"input\", \"zone.park\"), header = None)\n        df_zones_park.columns = [\"taz\", \"park_zones\"]\n\n        df_zones_park_agg = pd.merge(df_zones_park, df_clusters, on = \"taz\", how = 'left')\n        df_zones_park_agg = df_zones_park_agg.groupby([\"cluster_id\"])['park_zones'].max().reset_index()\n        df_zones_park_agg.columns = [\"taz\", \"park_zones\"]\n        df_zones_park_agg.to_fwf(os.path.join(rsm_dir, \"input\", \"zone.park\"))\n\n    else:\n        raise FileNotFoundError(\"zone.park\")\n\n\n    # aggregating tap.ptype \n    if \"tap.ptype\" in input_files:\n        logging.info(\"Aggregating - tap.ptype\")\n        df_tap_ptype = pd.read_fwf(os.path.join(model_dir, \"input\", \"tap.ptype\"), header = None)\n        df_tap_ptype.columns = [\"tap\", \"lot id\", \"parking type\", \"taz\", \"capacity\", \"distance\", \"transit mode\"]\n\n        df_tap_ptype = pd.merge(df_tap_ptype, df_clusters, on = \"taz\", how = 'left')\n\n        df_tap_ptype = df_tap_ptype[[\"tap\", \"lot id\", \"parking type\", \"cluster_id\", \"capacity\", \"distance\", \"transit mode\"]]\n        df_tap_ptype = df_tap_ptype.rename(columns = {\"cluster_id\": \"taz\"})\n        #df_tap_ptype.to_fwf(os.path.join(rsm_dir, \"input\", \"tap.ptype\"))\n\n        widths = [5, 6, 6, 5, 5, 5, 3]\n\n        with open(os.path.join(rsm_dir, \"input\", \"tap.ptype\"), 'w') as f:\n            for index, row in df_tap_ptype.iterrows():\n                field1 = str(row[0]).rjust(widths[0])\n                field2 = str(row[1]).rjust(widths[1])\n                field3 = str(row[2]).rjust(widths[2])\n                field4 = str(row[3]).rjust(widths[3])\n                field5 = str(row[4]).rjust(widths[4])\n                field6 = str(row[5]).rjust(widths[5])\n                field7 = str(row[6]).rjust(widths[6])\n                f.write(f'{field1}{field2}{field3}{field4}{field5}{field6}{field7}\\n')\n\n    else:\n        raise FileNotFoundError(\"tap.ptype\")\n\n    #aggregating accessam.csv\n    if \"accessam.csv\" in input_files:\n        logging.info(\"Aggregating - accessam.csv\")\n        df_acc = pd.read_csv(os.path.join(model_dir, \"input\", \"accessam.csv\"), header = None)\n        df_acc.columns = ['TAZ', 'TAP', 'TIME', 'DISTANCE', 'MODE']\n\n        df_acc['TAZ'] = df_acc['TAZ'].map(dict_clusters)\n        df_acc_agg = df_acc.groupby(['TAZ', 'TAP', 'MODE'])['TIME', 'DISTANCE'].mean().reset_index()\n        df_acc_agg = df_acc_agg[[\"TAZ\", \"TAP\", \"TIME\", \"DISTANCE\", \"MODE\"]]\n\n        df_acc_agg.to_csv(os.path.join(rsm_dir, \"input\", \"accessam.csv\"), index = False, header =False)\n    else:\n        raise FileNotFoundError(\"accessam.csv\")\n\n    # aggregating ParkLocationAlts.csv\n    if \"ParkLocationAlts.csv\" in input_files:\n        logging.info(\"Aggregating - ParkLocationAlts.csv\")\n        df_park = pd.read_csv(os.path.join(model_dir, \"uec\", \"ParkLocationAlts.csv\"))\n        df_park['mgra_new'] = df_park[\"mgra\"].map(mgra_cwk)\n        df_park_agg = df_park.groupby([\"mgra_new\"])[\"parkarea\"].min().reset_index() # assuming 1 is \"parking\" and 2 is \"no parking\"\n        df_park_agg['a'] = [i+1 for i in range(len(df_park_agg))]\n\n        df_park_agg.columns = [\"a\", \"mgra\", \"parkarea\"]\n        df_park_agg.to_csv(os.path.join(rsm_dir, \"uec\", \"ParkLocationAlts.csv\"), index = False)\n\n    else:\n        FileNotFoundError(\"ParkLocationAlts.csv\")\n\n    # aggregating CrossBorderDestinationChoiceSoaAlternatives.csv\n    if \"CrossBorderDestinationChoiceSoaAlternatives.csv\" in input_files:\n        logging.info(\"Aggregating - CrossBorderDestinationChoiceSoaAlternatives.csv\")\n        df_cb = pd.read_csv(os.path.join(model_dir, \"uec\",\"CrossBorderDestinationChoiceSoaAlternatives.csv\"))\n\n        df_cb[\"mgra_entry\"] = df_cb[\"mgra_entry\"].map(mgra_cwk)\n        df_cb[\"mgra_return\"] = df_cb[\"mgra_return\"].map(mgra_cwk)\n        df_cb[\"a\"] = df_cb[\"a\"].map(mgra_cwk)\n\n        df_cb = pd.merge(df_cb, df_clusters, left_on = \"dest\", right_on = \"taz\", how = 'left')\n        df_cb = df_cb.drop(columns = [\"dest\", \"taz\"])\n        df_cb = df_cb.rename(columns = {'cluster_id' : 'dest'})\n\n        df_cb_final  = df_cb.drop_duplicates()\n\n        df_cb_final = df_cb_final[[\"a\", \"dest\", \"poe\", \"mgra_entry\", \"mgra_return\", \"poe_taz\"]]\n        df_cb_final.to_csv(os.path.join(rsm_dir, \"uec\", \"CrossBorderDestinationChoiceSoaAlternatives.csv\"), index = False)\n\n    else:\n        FileNotFoundError(\"CrossBorderDestinationChoiceSoaAlternatives.csv\")\n\n    # aggregating households.csv\n    if \"households.csv\" in input_files:\n        logging.info(\"Aggregating - households.csv\")\n        df_hh = pd.read_csv(os.path.join(model_dir, \"input\", \"households.csv\"))\n        df_hh[\"mgra\"] = df_hh[\"mgra\"].map(mgra_cwk)\n        df_hh[\"taz\"] = df_hh[\"taz\"].map(dict_clusters)\n\n        df_hh.to_csv(os.path.join(rsm_dir, \"input\", \"households.csv\"), index = False)\n\n    else:\n        FileNotFoundError(\"households.csv\")\n\n    # aggregating ShadowPricingOutput_school_9.csv\n    if \"ShadowPricingOutput_school_9.csv\" in input_files:\n        logging.info(\"Aggregating - ShadowPricingOutput_school_9.csv\")\n        df_sp_sch = pd.read_csv(os.path.join(model_dir, \"input\", \"ShadowPricingOutput_school_9.csv\"))\n\n        agg_instructions = {}\n        for col in df_sp_sch.columns:\n            if \"size\" in col:\n                agg_instructions.update({col: \"sum\"})\n\n            if \"shadowPrices\" in col:\n                agg_instructions.update({col: \"max\"})\n\n            if \"_origins\" in col:\n                agg_instructions.update({col: \"sum\"})\n\n            if \"_modeledDests\" in col:\n                agg_instructions.update({col: \"sum\"})\n\n        df_sp_sch['mgra'] = df_sp_sch['mgra'].map(mgra_cwk)\n        df_sp_sch_agg = df_sp_sch.groupby(['mgra']).agg(agg_instructions).reset_index()\n\n        alt = list(df_sp_sch_agg['mgra'])\n        df_sp_sch_agg.insert(loc=0, column=\"alt\", value=alt)\n        df_sp_sch_agg.loc[len(df_sp_agg.index)] = 0\n\n        df_sp_sch_agg.to_csv(os.path.join(rsm_dir, \"input\", \"ShadowPricingOutput_school_9.csv\"), index=False)\n\n    else:\n        FileNotFoundError(\"ShadowPricingOutput_school_9.csv\")\n\n    # aggregating ShadowPricingOutput_work_9.csv\n    if \"ShadowPricingOutput_work_9.csv\" in input_files:\n        logging.info(\"Aggregating - ShadowPricingOutput_work_9.csv\")\n        df_sp_wrk = pd.read_csv(os.path.join(model_dir, \"input\", \"ShadowPricingOutput_work_9.csv\"))\n\n        agg_instructions = {}\n        for col in df_sp_wrk.columns:\n            if \"size\" in col:\n                agg_instructions.update({col: \"sum\"})\n\n            if \"shadowPrices\" in col:\n                agg_instructions.update({col: \"max\"})\n\n            if \"_origins\" in col:\n                agg_instructions.update({col: \"sum\"})\n\n            if \"_modeledDests\" in col:\n                agg_instructions.update({col: \"sum\"})\n\n        df_sp_wrk['mgra'] = df_sp_wrk['mgra'].map(mgra_cwk)\n\n        df_sp_wrk_agg = df_sp_wrk.groupby(['mgra']).agg(agg_instructions).reset_index()\n\n        alt = list(df_sp_wrk_agg['mgra'])\n        df_sp_wrk_agg.insert(loc=0, column=\"alt\", value=alt)\n\n        df_sp_wrk_agg.loc[len(df_sp_wrk_agg.index)] = 0\n\n        df_sp_wrk_agg.to_csv(os.path.join(rsm_dir, \"input\", \"ShadowPricingOutput_work_9.csv\"), index=False)\n\n    else:\n        FileNotFoundError(\"ShadowPricingOutput_work_9.csv\")\n\n    if \"TourDcSoaDistanceAlts.csv\" in input_files:\n        logging.info(\"Aggregating - TourDcSoaDistanceAlts.csv\")\n        df_TourDcSoaDistanceAlts = pd.DataFrame({\"a\" : range(1,taz_zones+1), \"dest\" : range(1, taz_zones+1)})\n        df_TourDcSoaDistanceAlts.to_csv(os.path.join(rsm_dir, \"uec\", \"TourDcSoaDistanceAlts.csv\"), index=False)\n\n    if \"DestinationChoiceAlternatives.csv\" in input_files:\n        logging.info(\"Aggregating - DestinationChoiceAlternatives.csv\")\n        df_DestinationChoiceAlternatives = pd.DataFrame({\"a\" : range(1,mgra_zones+1), \"mgra\" : range(1, mgra_zones+1)})\n        df_DestinationChoiceAlternatives.to_csv(os.path.join(rsm_dir, \"uec\", \"DestinationChoiceAlternatives.csv\"), index=False)\n\n    if \"SoaTazDistAlts.csv\" in input_files:\n        logging.info(\"Aggregating - SoaTazDistAlts.csv\")\n        df_SoaTazDistAlts = pd.DataFrame({\"a\" : range(1,taz_zones+1), \"dest\" : range(1, taz_zones+1)})\n        df_SoaTazDistAlts.to_csv(os.path.join(rsm_dir, \"uec\", \"SoaTazDistAlts.csv\"), index=False)\n\n    if \"TripMatrices.csv\" in input_files:\n        logging.info(\"Aggregating - TripMatrices.csv\")\n        trips = pd.read_csv(os.path.join(model_dir,\"output\", \"TripMatrices.csv\"))\n        trips['i'] = trips['i'].map(dict_clusters)\n        trips['j'] = trips['j'].map(dict_clusters)\n\n        cols = list(trips.columns)\n        cols.remove(\"i\")\n        cols.remove(\"j\")\n\n        trips_df = trips.groupby(['i', 'j'])[cols].sum().reset_index()\n        trips_df.to_csv(os.path.join(rsm_dir, \"output\", \"TripMatrices.csv\"), index = False)\n\n    else:\n        FileNotFoundError(\"TripMatrices.csv\")\n\n    if \"transponderModelAccessibilities.csv\" in input_files:\n        logging.info(\"Aggregating - transponderModelAccessibilities.csv\")\n        tran_access = pd.read_csv(os.path.join(model_dir, \"output\", \"transponderModelAccessibilities.csv\"))\n        tran_access['TAZ'] = tran_access['TAZ'].map(dict_clusters)\n\n        tran_access_agg = tran_access.groupby(['TAZ'])['DIST','AVGTTS','PCTDETOUR'].mean().reset_index()\n        tran_access_agg.to_csv(os.path.join(rsm_dir, \"output\",\"transponderModelAccessibilities.csv\"), index = False)\n\n    else:\n        raise FileNotFoundError(\"transponderModelAccessibilities.csv\")\n\n    if \"crossBorderTours.csv\" in input_files:\n        logging.info(\"Aggregating - crossBorderTours.csv\")\n        df = pd.read_csv(os.path.join(model_dir, \"output\", \"crossBorderTours.csv\"))\n        df['originMGRA'] = df['originMGRA'].map(mgra_cwk)\n        df['destinationMGRA'] = df['destinationMGRA'].map(mgra_cwk)\n\n        df['originTAZ'] = df['originTAZ'].map(dict_clusters)\n        df['destinationTAZ'] = df['destinationTAZ'].map(dict_clusters)\n        df.to_csv(os.path.join(rsm_dir, \"output\", \"crossBorderTours.csv\"), index = False)\n\n    else:\n        raise FileNotFoundError(\"crossBorderTours.csv\")\n\n    if \"crossBorderTrips.csv\" in input_files:\n        logging.info(\"Aggregating - crossBorderTrips.csv\")\n        df = pd.read_csv(os.path.join(model_dir, \"output\", \"crossBorderTrips.csv\"))\n        df['originMGRA'] = df['originMGRA'].map(mgra_cwk)\n        df['destinationMGRA'] = df['destinationMGRA'].map(mgra_cwk)\n\n        df['originTAZ'] = df['originTAZ'].map(dict_clusters)\n        df['destinationTAZ'] = df['destinationTAZ'].map(dict_clusters)\n        df.to_csv(os.path.join(rsm_dir, \"output\", \"crossBorderTrips.csv\"), index = False)\n\n    else:\n        raise FileNotFoundError(\"crossBorderTrips.csv\")\n\n    if \"internalExternalTrips.csv\" in input_files:\n        logging.info(\"Aggregating - internalExternalTrips.csv\")\n        df = pd.read_csv(os.path.join(model_dir, \"output\", \"internalExternalTrips.csv\"))\n        df['originMGRA'] = df['originMGRA'].map(mgra_cwk)\n        df['destinationMGRA'] = df['destinationMGRA'].map(mgra_cwk)\n\n        df['originTAZ'] = df['originTAZ'].map(dict_clusters)\n        df['destinationTAZ'] = df['destinationTAZ'].map(dict_clusters)\n        df.to_csv(os.path.join(rsm_dir, \"output\", \"internalExternalTrips.csv\"), index = False)\n\n    else:\n        raise FileNotFoundError(\"internalExternalTrips.csv\")\n\n    if \"visitorTours.csv\" in input_files:\n        logging.info(\"Aggregating - visitorTours.csv\")\n        df = pd.read_csv(os.path.join(model_dir, \"output\", \"visitorTours.csv\"))\n\n        df['originMGRA'] = df['originMGRA'].map(mgra_cwk)\n        df['destinationMGRA'] = df['destinationMGRA'].map(mgra_cwk)\n\n        df.to_csv(os.path.join(rsm_dir, \"output\", \"visitorTours.csv\"), index = False)\n\n    else:\n        raise FileNotFoundError(\"visitorTours.csv\")\n\n    if \"visitorTrips.csv\" in input_files:\n        logging.info(\"Aggregating - visitorTrips.csv\")\n        df = pd.read_csv(os.path.join(model_dir, \"output\", \"visitorTrips.csv\"))\n\n        df['originMGRA'] = df['originMGRA'].map(mgra_cwk)\n        df['destinationMGRA'] = df['destinationMGRA'].map(mgra_cwk)\n\n        df.to_csv(os.path.join(rsm_dir, \"output\", \"visitorTrips.csv\"), index = False)\n\n    else:\n        raise FileNotFoundError(\"visitorTrips.csv\")\n\n    if \"householdAVTrips.csv\" in input_files:\n        logging.info(\"Aggregating - householdAVTrips.csv\")\n        df = pd.read_csv(os.path.join(model_dir, \"output\", \"householdAVTrips.csv\"))\n        #print(os.path.join(model_dir, \"output\", \"householdAVTrips.csv\"))\n        df['orig_mgra'] = df['orig_mgra'].map(mgra_cwk)\n        df['dest_gra'] = df['dest_gra'].map(mgra_cwk)\n\n        df['trip_orig_mgra'] = df['trip_orig_mgra'].map(mgra_cwk)\n        df['trip_dest_mgra'] = df['trip_dest_mgra'].map(mgra_cwk)\n        df.to_csv(os.path.join(rsm_dir, \"output\", \"householdAVTrips.csv\"), index = False)\n\n    else:\n        raise FileNotFoundError(\"householdAVTrips.csv\")\n\n    if \"airport_out.CBX.csv\" in input_files:\n        logging.info(\"Aggregating - airport_out.CBX.csv\")\n        df = pd.read_csv(os.path.join(model_dir, \"output\", \"airport_out.CBX.csv\"))\n        df['originMGRA'] = df['originMGRA'].map(mgra_cwk)\n        df['destinationMGRA'] = df['destinationMGRA'].map(mgra_cwk)\n\n        df['originTAZ'] = df['originTAZ'].map(dict_clusters)\n        df['destinationTAZ'] = df['destinationTAZ'].map(dict_clusters)\n        df.to_csv(os.path.join(rsm_dir, \"output\", \"airport_out.CBX.csv\"), index = False)\n\n    else:\n        raise FileNotFoundError(\"airport_out.CBX.csv\")\n\n    if \"airport_out.SAN.csv\" in input_files:\n        logging.info(\"Aggregating - airport_out.SAN.csv\")\n        df = pd.read_csv(os.path.join(model_dir, \"output\", \"airport_out.SAN.csv\"))\n        df['originMGRA'] = df['originMGRA'].map(mgra_cwk)\n        df['destinationMGRA'] = df['destinationMGRA'].map(mgra_cwk)\n\n        df['originTAZ'] = df['originTAZ'].map(dict_clusters)\n        df['destinationTAZ'] = df['destinationTAZ'].map(dict_clusters)\n        df.to_csv(os.path.join(rsm_dir, \"output\", \"airport_out.SAN.csv\"), index = False)\n\n    else:\n        raise FileNotFoundError(\"airport_out.SAN.csv\")\n\n    if \"TNCtrips.csv\" in input_files:\n        logging.info(\"Aggregating - TNCtrips.csv\")\n        df = pd.read_csv(os.path.join(model_dir, \"output\", \"TNCtrips.csv\"))\n        df['originMgra'] = df['originMgra'].map(mgra_cwk)\n        df['destinationMgra'] = df['destinationMgra'].map(mgra_cwk)\n\n        df['originTaz'] = df['originTaz'].map(dict_clusters)\n        df['destinationTaz'] = df['destinationTaz'].map(dict_clusters)\n        df.to_csv(os.path.join(rsm_dir, \"output\", \"TNCtrips.csv\"), index = False)\n\n    else:\n        raise FileNotFoundError(\"TNCtrips.csv\")\n\n    files = [\"Trip\" + \"_\" + i + \"_\" + j + \".csv\" for i, j in\n                itertools.product([\"FA\", \"GO\", \"IN\", \"RE\", \"SV\", \"TH\", \"WH\"],\n                                   [\"OE\", \"AM\", \"MD\", \"PM\", \"OL\"])]\n\n    for file in files:\n        logging.info(f\"Aggregating - {file}\")\n        df = pd.read_csv(os.path.join(model_dir, \"output\", file))\n        df['I'] = df['I'].map(dict_clusters)\n        df['J'] = df['J'].map(dict_clusters)\n        df['HomeZone'] = df['HomeZone'].map(dict_clusters)\n        df.to_csv(os.path.join(rsm_dir, \"output\",file), index = False)\n
      "},{"location":"api.html#rsm.translate.copy_transit_demand","title":"copy_transit_demand(matrix_names, input_dir='.', output_dir='.')","text":"

      copies the omx transit demand matrix to rsm directory

      "},{"location":"api.html#rsm.translate.copy_transit_demand--parameters","title":"Parameters","text":"

      matrix_names : matrix_names (list) omx matrix filenames to aggregate input_dir : input_dir (Path-like) default \u201c.\u201d output_dir : output_dir (Path-like) default \u201c.\u201d

      "},{"location":"api.html#rsm.translate.copy_transit_demand--returns","title":"Returns","text":"Source code in rsm/translate.py
      def copy_transit_demand(\n    matrix_names,\n    input_dir=\".\",\n    output_dir=\".\"\n):\n\"\"\"\n    copies the omx transit demand matrix to rsm directory\n\n    Parameters\n    ----------\n    matrix_names : matrix_names (list)\n        omx matrix filenames to aggregate\n    input_dir : input_dir (Path-like) \n        default \".\"\n    output_dir : output_dir (Path-like)\n        default \".\"\n\n    Returns\n    -------\n\n    \"\"\"\n\n\n    for mat_name in matrix_names:\n        if '.omx' not in mat_name:\n            mat_name = mat_name + \".omx\"\n\n        input_file_dir = os.path.join(input_dir, mat_name)\n        output_file_dir = os.path.join(output_dir, mat_name)\n\n        shutil.copy(input_file_dir, output_file_dir)\n
      "},{"location":"api.html#rsm.translate.translate_emmebank_demand","title":"translate_emmebank_demand(input_databank, output_databank, cores_to_aggregate, agg_zone_mapping)","text":"

      aggregates the demand matrix cores from one emme databank and loads them into another databank

      "},{"location":"api.html#rsm.translate.translate_emmebank_demand--parameters","title":"Parameters","text":"

      input_databank : input_databank (Emme databank) Emme databank output_databank : output_databank (Emme databank) Emme databank cores_to_aggregate : cores_to_aggregate (list) matrix corenames to aggregate agg_zone_mapping: agg_zone_mapping (Path-like or pandas.DataFrame) zone number mapping between original and aggregated zones. columns: original zones as \u2018taz\u2019 and aggregated zones as \u2018cluster_id\u2019

      "},{"location":"api.html#rsm.translate.translate_emmebank_demand--returns","title":"Returns","text":"

      None. Loads the trip matrices into emmebank.

      Source code in rsm/translate.py
      def translate_emmebank_demand(\n    input_databank,\n    output_databank,\n    cores_to_aggregate,\n    agg_zone_mapping,\n): \n\"\"\"\n    aggregates the demand matrix cores from one emme databank and loads them into another databank\n\n    Parameters\n    ----------\n    input_databank : input_databank (Emme databank)\n        Emme databank\n    output_databank : output_databank (Emme databank)\n        Emme databank\n    cores_to_aggregate : cores_to_aggregate (list)\n        matrix corenames to aggregate\n    agg_zone_mapping: agg_zone_mapping (Path-like or pandas.DataFrame)\n        zone number mapping between original and aggregated zones. \n        columns: original zones as 'taz' and aggregated zones as 'cluster_id'\n\n    Returns\n    -------\n    None. Loads the trip matrices into emmebank.\n\n    \"\"\"\n\n    agg_zone_mapping_df = pd.read_csv(os.path.join(agg_zone_mapping))\n    agg_zone_mapping_df = agg_zone_mapping_df.sort_values('taz')\n\n    agg_zone_mapping_df.columns= agg_zone_mapping_df.columns.str.strip().str.lower()\n    zone_mapping = dict(zip(agg_zone_mapping_df['taz'], agg_zone_mapping_df['cluster_id']))\n\n    for core in cores_to_aggregate: \n        matrix = input_databank.matrix(core).get_data()\n        matrix_array = matrix.to_numpy()\n\n        matrix_agg = _aggregate_matrix(matrix_array, zone_mapping)\n\n        output_matrix = output_databank.matrix(core)\n        output_matrix.set_numpy_data(matrix_agg)\n
      "},{"location":"api.html#rsm.translate.translate_omx_demand","title":"translate_omx_demand(matrix_names, agg_zone_mapping, input_dir='.', output_dir='.')","text":"

      aggregates the omx demand matrix to aggregated zone system

      "},{"location":"api.html#rsm.translate.translate_omx_demand--parameters","title":"Parameters","text":"

      matrix_names : matrix_names (list) omx matrix filenames to aggregate agg_zone_mapping: agg_zone_mapping (path_like or pandas.DataFrame) zone number mapping between original and aggregated zones. columns: original zones as \u2018taz\u2019 and aggregated zones as \u2018cluster_id\u2019 input_dir : input_dir (path_like) default \u201c.\u201d output_dir : output_dir (path_like) default \u201c.\u201d

      "},{"location":"api.html#rsm.translate.translate_omx_demand--returns","title":"Returns","text":"Source code in rsm/translate.py
      def translate_omx_demand(\n    matrix_names,\n    agg_zone_mapping,\n    input_dir=\".\",\n    output_dir=\".\"\n): \n\"\"\"\n    aggregates the omx demand matrix to aggregated zone system\n\n    Parameters\n    ----------\n    matrix_names : matrix_names (list)\n        omx matrix filenames to aggregate\n    agg_zone_mapping: agg_zone_mapping (path_like or pandas.DataFrame)\n        zone number mapping between original and aggregated zones. \n        columns: original zones as 'taz' and aggregated zones as 'cluster_id'\n    input_dir : input_dir (path_like)\n        default \".\"\n    output_dir : output_dir (path_like) \n        default \".\"\n\n    Returns\n    -------\n\n    \"\"\"\n\n    agg_zone_mapping_df = pd.read_csv(os.path.join(agg_zone_mapping))\n    agg_zone_mapping_df = agg_zone_mapping_df.sort_values('taz')\n\n    agg_zone_mapping_df.columns= agg_zone_mapping_df.columns.str.strip().str.lower()\n    zone_mapping = dict(zip(agg_zone_mapping_df['taz'], agg_zone_mapping_df['cluster_id']))\n    agg_zones = sorted(agg_zone_mapping_df['cluster_id'].unique())\n\n    for mat_name in matrix_names:\n        if '.omx' not in mat_name:\n            mat_name = mat_name + \".omx\"\n\n        #logger.info(\"Aggregating Matrix: \" + mat_name + \" ...\")\n\n        input_skim_file = os.path.join(input_dir, mat_name)\n        print(input_skim_file)\n        output_skim_file = os.path.join(output_dir, mat_name)\n\n        assert os.path.isfile(input_skim_file)\n\n        input_matrix = omx.open_file(input_skim_file, mode=\"r\") \n        input_mapping_name = input_matrix.list_mappings()[0]\n        input_cores = input_matrix.list_matrices()\n\n        output_matrix = omx.open_file(output_skim_file, mode=\"w\")\n\n        for core in input_cores:\n            matrix = input_matrix[core]\n            matrix_array = matrix.read()\n            matrix_agg = _aggregate_matrix(matrix_array, zone_mapping)\n            output_matrix[core] = matrix_agg\n\n        output_matrix.create_mapping(title=input_mapping_name, entries=agg_zones)\n\n        input_matrix.close()\n        output_matrix.close()\n
      "},{"location":"api.html#rsm.sampler.rsm_household_sampler","title":"rsm_household_sampler(input_dir='.', output_dir='.', prev_iter_access=None, curr_iter_access=None, study_area=None, input_household='households.csv', input_person='persons.csv', taz_crosswalk='taz_crosswalk.csv', mgra_crosswalk='mgra_crosswalk.csv', compare_access_columns=('NONMAN_AUTO', 'NONMAN_TRANSIT', 'NONMAN_NONMOTOR', 'NONMAN_SOV_0'), default_sampling_rate=0.25, lower_bound_sampling_rate=0.15, upper_bound_sampling_rate=1.0, random_seed=42, output_household='sampled_households.csv', output_person='sampled_person.csv')","text":"

      Take an intelligent sampling of households.

      "},{"location":"api.html#rsm.sampler.rsm_household_sampler--parameters","title":"Parameters","text":"

      input_dir : input_dir (path_like) default \u201c.\u201d output_dir : output_dir (path_like) default \u201c.\u201d prev_iter_access : prev_iter_access (Path-like or pandas.DataFrame) Accessibility in an old (default, no treatment, etc) run is given (preloaded) or read in from here. Give as a relative path (from input_dir) or an absolute path. curr_iter_access : curr_iter_access (Path-like or pandas.DataFrame) Accessibility in the latest run is given (preloaded) or read in from here. Give as a relative path (from input_dir) or an absolute path. study_area : study_area (array-like) Array of RSM zone (these are numbered 1 to N in the RSM) in the study area. These zones are sampled at 100%. input_household : input_household (Path-like or pandas.DataFrame) Complete synthetic household file. This data will be filtered to match the sampling of households and written out to a new CSV file. input_person : input_person (Path-like or pandas.DataFrame) Complete synthetic persons file. This data will be filtered to match the sampling of households and written out to a new CSV file. compare_access_columns : compare_access_columns (Collection[str]) Column names in the accessibility file to use for comparing accessibility. Only changes in the values in these columns will be evaluated. default_sampling_rate : default_sampling_rate (float) The default sampling rate, in the range (0,1] lower_bound_sampling_rate : lower_bound_sampling_rate (float) Sampling rates by zone will be truncated so they are never lower than this. upper_bound_sampling_rate : upper_bound_sampling_rate (float) Sampling rates by zone will be truncated so they are never higher than this.

      "},{"location":"api.html#rsm.sampler.rsm_household_sampler--returns","title":"Returns","text":"

      sample_households_df, sample_persons_df : sample_households_df, sample_persons_df (pandas.DataFrame) These are the sampled population to resimulate. They are also written to the output_dir

      Source code in rsm/sampler.py
      def rsm_household_sampler(\n    input_dir=\".\",\n    output_dir=\".\",\n    prev_iter_access=None,\n    curr_iter_access=None,\n    study_area=None,\n    input_household=\"households.csv\",\n    input_person=\"persons.csv\",\n    taz_crosswalk=\"taz_crosswalk.csv\",\n    mgra_crosswalk=\"mgra_crosswalk.csv\",\n    compare_access_columns=(\n        \"NONMAN_AUTO\",\n        \"NONMAN_TRANSIT\",\n        \"NONMAN_NONMOTOR\",\n        \"NONMAN_SOV_0\",\n    ),\n    default_sampling_rate=0.25,  # fix the values of this after some testing\n    lower_bound_sampling_rate=0.15,  # fix the values of this after some testing\n    upper_bound_sampling_rate=1.0,  # fix the values of this after some testing\n    random_seed=42,\n    output_household=\"sampled_households.csv\",\n    output_person=\"sampled_person.csv\",\n):\n\"\"\"\n    Take an intelligent sampling of households.\n\n    Parameters\n    ----------\n    input_dir : input_dir (path_like)\n        default \".\"\n    output_dir : output_dir (path_like)\n        default \".\"\n    prev_iter_access : prev_iter_access (Path-like or pandas.DataFrame)\n        Accessibility in an old (default, no treatment, etc) run is given (preloaded)\n        or read in from here. Give as a relative path (from `input_dir`) or an\n        absolute path.\n    curr_iter_access : curr_iter_access (Path-like or pandas.DataFrame)\n        Accessibility in the latest run is given (preloaded) or read in from here.\n        Give as a relative path (from `input_dir`) or an absolute path.\n    study_area : study_area (array-like)\n        Array of RSM zone (these are numbered 1 to N in the RSM) in the study area. These zones are sampled at 100%.\n    input_household : input_household (Path-like or pandas.DataFrame)\n        Complete synthetic household file.  This data will be filtered to match the\n        sampling of households and written out to a new CSV file.\n    input_person : input_person (Path-like or pandas.DataFrame)\n        Complete synthetic persons file.  This data will be filtered to match the\n        sampling of households and written out to a new CSV file.\n    compare_access_columns : compare_access_columns (Collection[str])\n        Column names in the accessibility file to use for comparing accessibility.\n        Only changes in the values in these columns will be evaluated.\n    default_sampling_rate : default_sampling_rate (float)\n        The default sampling rate, in the range (0,1]\n    lower_bound_sampling_rate : lower_bound_sampling_rate (float)\n        Sampling rates by zone will be truncated so they are never lower than this.\n    upper_bound_sampling_rate : upper_bound_sampling_rate (float)\n        Sampling rates by zone will be truncated so they are never higher than this.\n\n    Returns\n    -------\n    sample_households_df, sample_persons_df : sample_households_df, sample_persons_df (pandas.DataFrame)\n        These are the sampled population to resimulate.  They are also written to\n        the output_dir\n    \"\"\"\n\n    input_dir = Path(input_dir or \".\")\n    output_dir = Path(output_dir or \".\")\n\n    logger.debug(\"CALL rsm_household_sampler\")\n    logger.debug(f\"  {input_dir=}\")\n    logger.debug(f\"  {output_dir=}\")\n\n    def _resolve_df(x, directory, make_index=None):\n        if isinstance(x, (str, Path)):\n            # read in the file to a pandas DataFrame\n            x = Path(x).expanduser()\n            if not x.is_absolute():\n                x = Path(directory or \".\").expanduser().joinpath(x)\n            try:\n                result = pd.read_csv(x)\n            except FileNotFoundError:\n                raise\n        elif isinstance(x, pd.DataFrame):\n            result = x\n        elif x is None:\n            result = None\n        else:\n            raise TypeError(\"must be path-like or DataFrame\")\n        if (\n            result is not None\n            and make_index is not None\n            and make_index in result.columns\n        ):\n            result = result.set_index(make_index)\n        return result\n\n    def _resolve_out_filename(x):\n        x = Path(x).expanduser()\n        if not x.is_absolute():\n            x = Path(output_dir).expanduser().joinpath(x)\n        x.parent.mkdir(parents=True, exist_ok=True)\n        return x\n\n    prev_iter_access_df = _resolve_df(\n        prev_iter_access, input_dir, make_index=\"MGRA\"\n    )\n    curr_iter_access_df = _resolve_df(\n        curr_iter_access, input_dir, make_index=\"MGRA\"\n    )\n    rsm_zones = _resolve_df(taz_crosswalk, input_dir)\n    dict_clusters = dict(zip(rsm_zones[\"taz\"], rsm_zones[\"cluster_id\"]))\n\n    rsm_mgra_zones = _resolve_df(mgra_crosswalk, input_dir)\n    rsm_mgra_zones.columns = rsm_mgra_zones.columns.str.strip().str.lower()\n    dict_clusters_mgra = dict(zip(rsm_mgra_zones[\"mgra\"], rsm_mgra_zones[\"cluster_id\"]))\n\n    # changing the taz and mgra to new cluster ids\n    input_household_df = _resolve_df(input_household, input_dir)\n    input_household_df[\"taz\"] = input_household_df[\"taz\"].map(dict_clusters)\n    input_household_df[\"mgra\"] = input_household_df[\"mgra\"].map(dict_clusters_mgra)\n    input_household_df[\"count\"] = 1\n\n    mgra_hh = input_household_df.groupby([\"mgra\"]).size().rename(\"n_hh\").to_frame()\n\n    if curr_iter_access_df is None or prev_iter_access_df is None:\n\n        if curr_iter_access_df is None:\n            logger.warning(f\"missing curr_iter_access_df from {curr_iter_access}\")\n        if prev_iter_access_df is None:\n            logger.warning(f\"missing prev_iter_access_df from {prev_iter_access}\")\n        # true when sampler is turned off. default_sampling_rate should be set to 1\n\n        mgra_hh[\"sampling_rate\"] = default_sampling_rate\n        if study_area is not None:\n            mgra_hh.loc[mgra_hh.index.isin(study_area), \"sample_rate\"] = 1\n\n        sample_households = []\n\n        for mgra_id, row in mgra_hh.iterrows():\n            df = input_household_df.loc[input_household_df[\"mgra\"] == mgra_id]\n            sampling_rate = row[\"sampling_rate\"]\n            logger.info(f\"Sampling rate of RSM zone {mgra_id}: {sampling_rate}\")\n            df = df.sample(frac=sampling_rate, random_state=mgra_id + random_seed)\n            sample_households.append(df)\n\n        # combine study are and non-study area households into single dataframe\n        sample_households_df = pd.concat(sample_households)\n\n    else:\n        # restrict to rows only where TAZs have households\n        prev_iter_access_df = prev_iter_access_df[\n            prev_iter_access_df.index.isin(mgra_hh.index)\n        ].copy()\n        curr_iter_access_df = curr_iter_access_df[\n            curr_iter_access_df.index.isin(mgra_hh.index)\n        ].copy()\n\n        # compare accessibility columns\n        compare_results = pd.DataFrame()\n\n        for column in compare_access_columns:\n            compare_results[column] = (\n                curr_iter_access_df[column] - prev_iter_access_df[column]\n            ).abs()  # take absolute difference\n        compare_results[\"MGRA\"] = prev_iter_access_df.index\n\n        compare_results = compare_results.set_index(\"MGRA\")\n\n        # Take row sums of all difference\n        compare_results[\"Total\"] = compare_results[list(compare_access_columns)].sum(\n            axis=1\n        )\n\n        # TODO: potentially adjust this later after we figure out a better approach\n        wgts = compare_results[\"Total\"] + 0.01\n        wgts /= wgts.mean() / default_sampling_rate\n        compare_results[\"sampling_rate\"] = np.clip(\n            wgts, lower_bound_sampling_rate, upper_bound_sampling_rate\n        )\n\n        sample_households = []\n        sample_rate_df = compare_results[[\"sampling_rate\"]].copy()\n        if study_area is not None:\n            sample_rate_df.loc[\n                sample_rate_df.index.isin(study_area), \"sampling_rate\"\n            ] = 1\n\n        for mgra_id, row in sample_rate_df.iterrows():\n            df = input_household_df.loc[input_household_df[\"mgra\"] == mgra_id]\n            sampling_rate = row[\"sampling_rate\"]\n            logger.info(f\"Sampling rate of RSM zone {mgra_id}: {sampling_rate}\")\n            df = df.sample(frac=sampling_rate, random_state=mgra_id + random_seed)\n            sample_households.append(df)\n\n        # combine study are and non-study area households into single dataframe\n        sample_households_df = pd.concat(sample_households)\n\n    sample_households_df = sample_households_df.sort_values(by=[\"hhid\"])\n    sample_households_df.to_csv(_resolve_out_filename(output_household), index=False)\n\n    # select persons belonging to sampled households\n    sample_hhids = sample_households_df[\"hhid\"].to_numpy()\n\n    persons_df = _resolve_df(input_person, input_dir)\n    sample_persons_df = persons_df.loc[persons_df[\"hhid\"].isin(sample_hhids)]\n    sample_persons_df.to_csv(_resolve_out_filename(output_person), index=False)\n\n    global_sample_rate = round(len(sample_households_df) / len(input_household_df),2)\n    logger.info(f\"Total Sampling Rate : {global_sample_rate}\")\n\n    return sample_households_df, sample_persons_df\n
      "},{"location":"api.html#rsm.assembler.rsm_assemble","title":"rsm_assemble(orig_indiv, orig_joint, rsm_indiv, rsm_joint, households, mgra_crosswalk=None, sample_rate=0.25, run_assembler=1)","text":"

      Assemble and evaluate RSM trip making.

      "},{"location":"api.html#rsm.assembler.rsm_assemble--parameters","title":"Parameters","text":"

      orig_indiv : orig_indiv (path_like) Trips table from \u201coriginal\u201d model run, should be comprehensive simulation of all individual trips for all synthetic households. orig_joint : orig_joint (path_like) Joint trips table from \u201coriginal\u201d model run, should be comprehensive simulation of all joint trips for all synthetic households. rsm_indiv : rsm_indiv (path_like) Trips table from RSM model run, should be a simulation of all individual trips for potentially only a subset of all synthetic households. rsm_joint : rsm_joint (path_like) Trips table from RSM model run, should be a simulation of all joint trips for potentially only a subset of all synthetic households (the same sampled households as in rsm_indiv). households : households (path_like) Synthetic household file, used to get home zones for households. mgra_crosswalk : mgra_crosswalk (path_like, optional) Crosswalk from original MGRA to clustered zone ids. Provide this crosswalk if the orig_indiv and orig_joint files reference the original MGRA system and those id\u2019s need to be converted to aggregated values before merging. sample_rate : sample_rate (float) Default/fixed sample rate if sampler was turned off this is used to scale the trips if run_assembler is 0 run_assembler : run_assembler (boolean) Flag to indicate whether to run RSM assembler or not. 1 is to run assembler, 0 is to turn if off setting this to 0 is only an option if sampler is turned off

      "},{"location":"api.html#rsm.assembler.rsm_assemble--returns","title":"Returns","text":"

      final_trips_rsm : final_ind_trips (pd.DataFrame) Assembled trip table for RSM run, filling in archived trip values for non-resimulated households. combined_trips_by_zone : final_jnt_trips (pd.DataFrame) Summary table of changes in trips by mode, by household home zone. Used to check whether undersampled zones have stable travel behavior.

      Separate tables for individual and joint trips, as required by java.

      Source code in rsm/assembler.py
      def rsm_assemble(\n    orig_indiv,\n    orig_joint,\n    rsm_indiv,\n    rsm_joint,\n    households,\n    mgra_crosswalk=None,\n    sample_rate=0.25,\n    run_assembler=1\n):\n\"\"\"\n    Assemble and evaluate RSM trip making.\n\n    Parameters\n    ----------\n    orig_indiv : orig_indiv (path_like)\n        Trips table from \"original\" model run, should be comprehensive simulation\n        of all individual trips for all synthetic households.\n    orig_joint : orig_joint (path_like)\n        Joint trips table from \"original\" model run, should be comprehensive simulation\n        of all joint trips for all synthetic households.\n    rsm_indiv : rsm_indiv (path_like)\n        Trips table from RSM model run, should be a simulation of all individual\n        trips for potentially only a subset of all synthetic households.\n    rsm_joint : rsm_joint (path_like)\n        Trips table from RSM model run, should be a simulation of all joint\n        trips for potentially only a subset of all synthetic households (the\n        same sampled households as in `rsm_indiv`).\n    households : households (path_like)\n        Synthetic household file, used to get home zones for households.\n    mgra_crosswalk : mgra_crosswalk (path_like, optional)\n        Crosswalk from original MGRA to clustered zone ids.  Provide this crosswalk\n        if the `orig_indiv` and `orig_joint` files reference the original MGRA system\n        and those id's need to be converted to aggregated values before merging.\n    sample_rate : sample_rate (float)\n        Default/fixed sample rate if sampler was turned off\n        this is used to scale the trips if run_assembler is 0\n    run_assembler : run_assembler (boolean)\n        Flag to indicate whether to run RSM assembler or not. \n        1 is to run assembler, 0 is to turn if off\n        setting this to 0 is only an option if sampler is turned off       \n\n    Returns\n    -------\n    final_trips_rsm : final_ind_trips (pd.DataFrame)\n        Assembled trip table for RSM run, filling in archived trip values for\n        non-resimulated households.\n    combined_trips_by_zone : final_jnt_trips (pd.DataFrame)\n        Summary table of changes in trips by mode, by household home zone.\n        Used to check whether undersampled zones have stable travel behavior.\n\n    Separate tables for individual and joint trips, as required by java.\n\n\n    \"\"\"\n    orig_indiv = Path(orig_indiv).expanduser()\n    orig_joint = Path(orig_joint).expanduser()\n    rsm_indiv = Path(rsm_indiv).expanduser()\n    rsm_joint = Path(rsm_joint).expanduser()\n    households = Path(households).expanduser()\n\n    assert os.path.isfile(orig_indiv)\n    assert os.path.isfile(orig_joint)\n    assert os.path.isfile(rsm_indiv)\n    assert os.path.isfile(rsm_joint)\n    assert os.path.isfile(households)\n\n    if mgra_crosswalk is not None:\n        mgra_crosswalk = Path(mgra_crosswalk).expanduser()\n        assert os.path.isfile(mgra_crosswalk)\n\n    # load trip data - partial simulation of RSM model\n    logger.info(\"reading ind_trips_rsm\")\n    ind_trips_rsm = pd.read_csv(rsm_indiv)\n    logger.info(\"reading jnt_trips_rsm\")\n    jnt_trips_rsm = pd.read_csv(rsm_joint)\n\n    if run_assembler == 1:\n        # load trip data - full simulation of residual/source model\n        logger.info(\"reading ind_trips_full\")\n        ind_trips_full = pd.read_csv(orig_indiv)\n        logger.info(\"reading jnt_trips_full\")\n        jnt_trips_full = pd.read_csv(orig_joint)\n\n        if mgra_crosswalk is not None:\n            logger.info(\"applying mgra_crosswalk to original data\")\n            mgra_crosswalk = pd.read_csv(mgra_crosswalk).set_index(\"MGRA\")[\"cluster_id\"]\n            mgra_crosswalk[-1] = -1\n            mgra_crosswalk[0] = 0\n            for col in [c for c in ind_trips_full.columns if c.lower().endswith(\"_mgra\")]:\n                ind_trips_full[col] = ind_trips_full[col].map(mgra_crosswalk)\n            for col in [c for c in jnt_trips_full.columns if c.lower().endswith(\"_mgra\")]:\n                jnt_trips_full[col] = jnt_trips_full[col].map(mgra_crosswalk)\n\n        # convert to rsm trips\n        logger.info(\"convert to common table platform\")\n        rsm_trips = _merge_joint_and_indiv_trips(ind_trips_rsm, jnt_trips_rsm)\n        original_trips = _merge_joint_and_indiv_trips(ind_trips_full, jnt_trips_full)\n\n        logger.info(\"get all hhids in trips produced by RSM\")\n        hh_ids_rsm = rsm_trips[\"hh_id\"].unique()\n\n        logger.info(\"remove orginal model trips made by households chosen in RSM trips\")\n        original_trips_not_resimulated = original_trips.loc[\n            ~original_trips[\"hh_id\"].isin(hh_ids_rsm)\n        ]\n        original_ind_trips_not_resimulated = ind_trips_full[\n            ~ind_trips_full[\"hh_id\"].isin(hh_ids_rsm)\n        ]\n        original_jnt_trips_not_resimulated = jnt_trips_full[\n            ~jnt_trips_full[\"hh_id\"].isin(hh_ids_rsm)\n        ]\n\n        logger.info(\"concatenate trips from rsm and original model\")\n        final_trips_rsm = pd.concat(\n            [rsm_trips, original_trips_not_resimulated], ignore_index=True\n        ).reset_index(drop=True)\n        final_ind_trips = pd.concat(\n            [ind_trips_rsm, original_ind_trips_not_resimulated], ignore_index=True\n        ).reset_index(drop=True)\n        final_jnt_trips = pd.concat(\n            [jnt_trips_rsm, original_jnt_trips_not_resimulated], ignore_index=True\n        ).reset_index(drop=True)\n\n        # Get percentage change in total trips by mode for each home zone\n\n        # extract trips made by households in RSM and Original model\n        original_trips_that_were_resimulated = original_trips.loc[\n            original_trips[\"hh_id\"].isin(hh_ids_rsm)\n        ]\n\n        def _agg_by_hhid_and_tripmode(df, name):\n            return df.groupby([\"hh_id\", \"trip_mode\"]).size().rename(name).reset_index()\n\n        # combining trips by hhid and trip mode\n        combined_trips = pd.merge(\n            _agg_by_hhid_and_tripmode(original_trips_that_were_resimulated, \"n_trips_orig\"),\n            _agg_by_hhid_and_tripmode(rsm_trips, \"n_trips_rsm\"),\n            on=[\"hh_id\", \"trip_mode\"],\n            how=\"outer\",\n            sort=True,\n        ).fillna(0)\n\n        # aggregating by Home zone\n        hh_rsm = pd.read_csv(households)\n        hh_id_col_names = [\"hhid\", \"hh_id\", \"household_id\"]\n        for hhid in hh_id_col_names:\n            if hhid in hh_rsm.columns:\n                break\n        else:\n            raise KeyError(f\"none of {hh_id_col_names!r} in household file\")\n        homezone_col_names = [\"mgra\", \"home_mgra\"]\n        for zoneid in homezone_col_names:\n            if zoneid in hh_rsm.columns:\n                break\n        else:\n            raise KeyError(f\"none of {homezone_col_names!r} in household file\")\n        hh_rsm = hh_rsm[[hhid, zoneid]]\n\n        # attach home zone id\n        combined_trips = pd.merge(\n            combined_trips, hh_rsm, left_on=\"hh_id\", right_on=hhid, how=\"left\"\n        )\n\n        combined_trips_by_zone = (\n            combined_trips.groupby([zoneid, \"trip_mode\"])[[\"n_trips_orig\", \"n_trips_rsm\"]]\n            .sum()\n            .reset_index()\n        )\n\n        combined_trips_by_zone = combined_trips_by_zone.eval(\n            \"net_change = (n_trips_rsm - n_trips_orig)\"\n        )\n\n        combined_trips_by_zone[\"max_trips\"] = np.fmax(\n            combined_trips_by_zone.n_trips_rsm, combined_trips_by_zone.n_trips_orig\n        )\n        combined_trips_by_zone = combined_trips_by_zone.eval(\n            \"pct_change = net_change / max_trips * 100\"\n        )\n        combined_trips_by_zone = combined_trips_by_zone.drop(columns=\"max_trips\")\n    else:\n        # if assembler is set to be turned off\n        # then scale the trips in the trip list using the fixed sample rate \n        # trips in the final trip lists will be 100%\n        scale_factor = int(1.0/sample_rate)\n\n        # concat is slow\n        # https://stackoverflow.com/questions/50788508/how-can-i-replicate-rows-of-a-pandas-dataframe\n        #final_ind_trips = pd.concat([ind_trips_rsm]*scale_factor, ignore_index=True)\n        #final_jnt_trips = pd.concat([jnt_trips_rsm]*scale_factor, ignore_index=True)\n\n        final_ind_trips = pd.DataFrame(\n            np.repeat(ind_trips_rsm.values, scale_factor, axis=0),\n            columns=ind_trips_rsm.columns\n        )\n\n        final_jnt_trips = pd.DataFrame(\n            np.repeat(jnt_trips_rsm.values, scale_factor, axis=0),\n            columns=jnt_trips_rsm.columns\n        )        \n\n    return final_ind_trips, final_jnt_trips\n
      "},{"location":"assessment.html","title":"Assessment","text":""},{"location":"assessment.html#rsm-configuration","title":"RSM Configuration","text":"

      The team conducted tests using different combinations for the RSM parameters, including the number of RSM zones (1000, 2000), default sampling rates (15%, 25%, 100%), enabling or disabling the intelligent sampler, and choosing the number of global iterations (2 or 3), among other factors. The most significant influence of the number of RSM zones was observed on the runtime of the highway assignment process. Since the highway assignment runtime was already low with 1000 RSM zones, there was no motivation to explore lower RSM zone number. Altering the sampling rate had a greater impact on the runtime of the demand model (CT-RAMP) compared to changing the number of RSM zones. These test runs exhibited varying runtimes depending on the specific configuration. Key metrics at the regional level were analyzed across these different test runs to comprehend the trade-off between improved runtime for RSM and achieving RSM results that are similar to ABM. Based on this, the team collectively determined that for the MVP (Minimum Viable Product) version of the RSM, the \u201coptimal\u201d configuration would be to use 2000 RSM zones, a 25% default sampling rate, the intelligent sampler turned off, and 2 global iterations and this RSM configuration was used to move forward with the overall assessment of the RSM.

      "},{"location":"assessment.html#calibration","title":"Calibration","text":"

      Aggregating the ABM zones to RSM zones, distorts the walk trips share coming out of the model. With the model configuration (Rapid Zones, Global Iterations, Sample Rate, etc.) for RSM as identified above, tour mode choice calibration was performed to match the RSM mode share to ABM2+ mode share, primarily to match the walk trips. A calibration constant was applied to the tour mode choice UEC to School, Maintenance, Discretionary tour purpose. The mode share for Work and University purpsoe were reasonable, therefore the calibration wasn\u2019t applied to those purposes.

      RSM specific constants were added to the Tour Mode Choice UEC (TourModeChoice.xls) to some of the tour purposes. The Walk mode share for the Maintenance and Discretionary purposes was first adjusted by calibrating and applying n RSM specific constant row to the UEC. Furthermore, in cases where the tour involved escorting for Maintenance or Discretionary purposes, an additional calibration constant was introduced to further adjust the walk mode share for such escort tours. Similarly, a differeent set of constants were added to calibrate the School tour purpose. There was no need to calibrate mode choice for any other tour purpose as those were reasonable from RSM.

      Note that a minor calibration will be required for RSM when number of rapid zones are changed.

      Here is how the mode share and VMT compares before and after the calibration for RSM. Donor model in the charts below refers to the ABM2+ run.

      "},{"location":"assessment.html#base-year-validation","title":"Base Year Validation","text":"

      Here is the table of ABM2+ and RSM outcome comparison after the RSM calibration. The metrics used are some of the regional level key metrics. Volume comparison for the roadway segment on I-5 and I-8 were chosen at random.

      "},{"location":"assessment.html#runtime-comparison","title":"Runtime Comparison","text":"

      For base year 2016 simulation, below is the runtime comparison of ABM2+ vs RSM.

      "},{"location":"assessment.html#sensitivity-testing","title":"Sensitivity Testing","text":"

      After validating the RSM for base year with the chosen design configuration, RSM was used to carry out hypothetical planning studies related to some broader use-cases. Model results from both RSM and ABM2+ were compared for each of the sensitivity test to assess the performance of RSM and evaluate if RSM could be a viable tool for such policy planning.

      For each test, a few key metrics from ABM2+ No Action, ABM2+ Action, RSM No Action and RSM Action scenario runs were compared. The goal was to have RSM and ABM2+ show similar sensitivities for action vs no-action.

      "},{"location":"assessment.html#regional-highway-changes","title":"Regional Highway Changes","text":""},{"location":"assessment.html#auto-operating-cost-50-increase","title":"Auto Operating Cost - 50% Increase","text":""},{"location":"assessment.html#auto-operating-cost-50-decrease","title":"Auto Operating Cost - 50% Decrease","text":""},{"location":"assessment.html#ride-hailing-cost-50-decrease","title":"Ride Hailing Cost - 50% decrease","text":""},{"location":"assessment.html#automated-vehicles-100-adoption","title":"Automated Vehicles - 100% Adoption","text":"

      In SANDAG model, the AV adoption is analyzed by capturing the zero occupancy vehicle movement as simulated in the Household AV Allocation module. For RSM, this AV allocation module is skipped, which is why RSM is not a viable tool for evaluating policies related to automated vehicles.

      "},{"location":"assessment.html#land-use-changes","title":"Land Use Changes","text":"

      RSM and ABM2+ shows similar sensitivities for the two tested scenarios with land use change.

      "},{"location":"assessment.html#change-in-land-use-job-housing-balance","title":"Change in land use - Job Housing Balance","text":""},{"location":"assessment.html#change-in-land-use-mixed-land-use","title":"Change in land use - Mixed Land Use","text":""},{"location":"assessment.html#regional-transit-changes","title":"Regional Transit Changes","text":""},{"location":"assessment.html#transit-fare","title":"Transit Fare","text":"

      TODO: Add some text to explain how this test was performed using the study area parameter TODO: Add outcome screenshot

      "},{"location":"assessment.html#transit-frequency","title":"Transit Frequency","text":"

      TODO: Add some text to explain how this test was performed using the study area parameter TODO: Add outcome screenshot

      "},{"location":"assessment.html#local-highway-changes","title":"Local Highway Changes","text":""},{"location":"assessment.html#managed-lane-conversion","title":"Managed Lane Conversion","text":"

      TODO: Add some text to explain how this test was performed using the study area parameter TODO: Add outcome screenshot

      "},{"location":"assessment.html#local-transit-changes","title":"Local Transit Changes","text":""},{"location":"assessment.html#rapid-637-brt","title":"Rapid 637 BRT","text":"

      TODO: Add some text to explain how this test was performed using the study area parameter TODO: Add outcome screenshot

      "},{"location":"development.html","title":"Development","text":""},{"location":"development.html#needs","title":"Needs","text":"

      The time needed to configure, run, and summarize results from ABM2+ is too slow to support a nimble, challenging, and engagement-oriented planning process. SANDAG needed a tool that quickly approximates the outcomes of ABM2+. The rapid strategic model, or RSM, was built for this purpose.

      ABM2+ Schematic is shown below

      "},{"location":"development.html#design-considerations","title":"Design Considerations","text":"

      Reducing the number of zones reduces model runtime.

      • MGRAs are aggregated into Rapid Zones based on their proximity to each other and similarity in regards to mode choice decisions.
      • RSM will have variable number of analysis zones and that can be quickly changed to assess trade-offs between runtime and how well the RSM results match the ABM2+ results.
      • Initial testing revealed 2,000 rapid zones is approximately optimal and will be used in initial deployments. For reference, ABM2+ has ~23,000 MGRAs and ~5,000 TAZs.

      Reducing the number of model components reduces runtime.

      • Most, but not all, of the policies of interest to SANDAG primarily impact resident passenger travel.
      • Therefore, RSM will only run passenger travel component while maintaining the other demand components fixed.

      Reducing the number of global iterations reduces runtime.

      • If the RSM results are in the same ballpark as ABM2+, reduce the number of global iterations from 3 to 2 for the model.

      Reducing sample rate reduces runtime.

      • Runtime of the resident model will reduce if less population is simulated.
      • ABM2+ simulates population as 25 percent (first iteration), 50 percent (second iteration) and 100 percent (third iteration).
      • RSM will attempt to intelligently sample population and vary it by TAZ with higher sample rate in zones with large changes in accessibility and lower rates in zones with small changes in accessibility.
      • RSM could also have higher sampling in zones around the analysis project and lower elsewhere.
      "},{"location":"development.html#architecture","title":"Architecture","text":"

      The RSM is developed as a Python package and the required modules are launched when running the existing SANDAG travel model as Rapid Model. It takes as input a complete ABM2+ model run and has following modules:

      "},{"location":"development.html#zone-aggregator","title":"Zone Aggregator","text":"

      The RSM zone creator/aggregator creates a set of RSM analysis zones (Rapid Zones) and a set of RSM input files compatible with the zone system, using a donor model run (ABM2+/ABM3) as input. The inputs include the MGRA shapefile (MGRASHAPE.zip), MGRA socioeconomic file (example: mgra13_based_input2016.csv), individual trips (indivTripData_3.csv), from the donor model. It produces a new MGRA socioeconomic file with new RSM zones and crosswalk files between original TAZ/MGRA and the rapid zones. Along with the inputs, the user can specify other parameters such as number of RSM zones, donor model run directory, number of external zones, MGRA socioeconomic file, names of crosswalk files generated by the zone aggregator module, optional study area file (to study localized changes in the region) and RSM zone centroid csv files in the model properties file (sandag_abm.properties).

      At the core of the RSM zone aggregator, the module performs several steps. The MGRA geographies are loaded from shapefiles, MGRA data is loaded from the MGRA socioeconomic file, and trip data is extracted from the individual trip file. Additional computations, like intersection counts and density variables, are performed on the MGRA data. The script aggregates the MGRA\u2019s attributes to create a new zone data based on \u201cTAZ\u201d (Traffic Analysis Zone). The individual trips file is used to calculate the mode shares for each TAZ. Additional travel time between TAZs to the point of interest (default includes San Diego city hall, outside Pendleton gate, Escondido city hall, Viejas casino, and San Ysidro trolley) are also added to the aggregated data by TAZ. The TAZs are further clustered to a user-defined number of RSM zones using several cluster factors (default factors and their weights are as follows: \u201cpopden\u201d: 1, \u201cempden\u201d: 1, \u201cmodeshare_NM\u201d: 100, \u201cmodeshare_WT\u201d: 100) and clustering algorithm. The current scripts support KMeans and agglomerative clustering algorithms to cluster the TAZs. In case the user has specified a study area, the function separately handles them and aggregates them into their clusters based on the specification provided in the study area file. The remaining TAZs are aggregated based on the aggregation algorithm.

      After the clustering, the aggregator produces the TAZ/MGRA crosswalks between old TAZs/MGRAs to new RSM zones. The elementary and high school enrollments are further checked and adjusted in the new RSM zone socioeconomic to prevent zero values.

      The user can also control the execution of the zone aggregator from the properties file. Once a baseline RSM run is established, other project related RSM can be setup to skip running the zone aggregator and the zone system from the RSM baseline can be used. Please note that MGRA and TAZs are essentially same geographically in the RSM model run except their numbering is different.

      "},{"location":"development.html#input-aggregator","title":"Input Aggregator","text":"

      The input aggregator module of RSM aggregates several input files, uec (soa) files, non-abm model outputs of the donor model based on the new RSM zones. The main inputs to this module include the location of the donor model, RSM socioeconomic file, TAZ and MGRA crosswalks. The module reads the original socioeconomic file and adds intersection count and several density variables that were originally generated by the 4D module of the current ABM2+ model. This is done here in RSM because the 4D module is skipped when running RSM. The module then uses the MGRA crosswalks between MGRA and RSM zones to aggregate the original socioeconomic file data based on the new RSM zones to create a new RSM specific socioeconomic file. Next, the module aggregates the following input files:

      File Name Aggregation Columns Aggregation Methodology microMgraEquivMinutes.csv walkTime, dist, mmTime, mmCost, mtTime,\u00a0mtCost, mmGenTime, mtGenTime, minTime Mapped MGRA to RSM zones and aggregated the columns by taking mean. microMgraTapEquivMinutes.csv walkTime, dist, mmTime, mmCost, mtTime,\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 mtCost, mmGenTime, mtGenTime, minTime Mapped MGRA to RSM zones and aggregated the columns by taking mean. walkMgraTapEquivMinutes.csv boardingPerceived, boardingActual,alightingPerceived,alightingActual,boardingGain,alightingGain Mapped MGRA to RSM zones and aggregated the columns by taking mean. walkMgraEquivMinutes.csv percieved,actual, gain Mapped MGRA to RSM zones and aggregated the columns by taking mean. bikeTazLogsum.csv logsum, time Mapped TAZ to RSM zones and aggregated the columns by taking the mean. bikeMgraLogsum.csv logsum, time Mapped MGRA to RSM zones and aggregated the columns by taking the mean. zone.term terminal_time Mapped TAZ to RSM zones and took the maximum. zones.park park_zones Mapped TAZ to RSM zones and took the maximum. tap.ptype Mapping RSM zones to TAZs accessam.csv TIME, DISTANCE ParkLocationAlts.csv parkarea Mapped MGRA to RSM zones and took the minimum. CrossBorderDestinationChoiceSoaAlternatives.csv Mapping RSM zones to MGRA TourDcSoaDistanceAlts.csv a, mgra It is recreated with RSM zones DestinationChoiceAlternatives.csv a, mgra It is recreated with RSM zones SoaTazDistAlts.csv a, dest It is recreated with RSM zones TripMatrices.csv CVM_<>:LT, CVM_<>:IT, CVM_<>:MT, CVM_<>:HT,CVM_<>:LNT, CVM_<>:INT, CVM_<>:MNT, CVM_<>:HNTwhere TIME PERIOD = EA, AM, MD, PM, EV Mapped TAZ to RSM zones and aggregated the columns by taking the sum. transponderModelAccessibilities.csv DIST,AVGTTS,PCTDETOUR Mapped TAZ to RSM zones and aggregated the columns by taking the mean. crossBorderTours.csv Mapped MGRA/TAZs to RSM zones internalExternalTrips.csv Mapped MGRA/TAZs to RSM zones visitorTours.csv Mapped MGRA to RSM zones visitorTrips.csv Mapped MGRA to RSM zones householdAVTrips.csv Mapped MGRA to RSM zones airport_out.SAN.csv Mapped MGRA/TAZ to RSM zones airport_out.CBX.csv Mapped MGRA/TAZ to RSM zones TNCtrips.csv Mapped MGRA/TAZ to RSM zones TRIP_<>_<>.CSVwhere SECTOR_TYPE = FA, GO, IN, RE, SV, TH, WHTIME_PERIOD = OE, AM, MD, PM, OL Mapped TAZ to RSM zones

      More details on the the above files can be found here.

      "},{"location":"development.html#translate-demand","title":"Translate Demand","text":"

      The translate demand module of the RSM aggregates the non-resident demand matrices and trip tables based on the new RSM zone structure. The inputs of this module includes the path to the RSM model directory, donor model directory and crosswalks. In particular the module aggregates the demand from auto, transit, non-motorized, other trips from the airport, cross border, internal external and visitor model. It also aggregated TNC vehicle trips and empty AV trips.

      "},{"location":"development.html#intelligent-sampler","title":"Intelligent Sampler","text":"

      The intelligent sampler module is designed to intelligently sample households and persons from synthetic households and person data, considering accessibility metrics and other parameters. The main inputs to this module are the households file, person file, TAZ/MGRA crosswalks and the outputs are sampled households and person files. In the model properties file (sandag_abm.properties), the user can choose to run RSM sampler, specify the default sampling rate, and minimum sampling rate for the RSM model run. The user also has the ability to sample specific zones at 100% by specifying them in the study area file and turn on the differential sampling indicator (use.differential.sampling equals to 1).

      The sampler function follows these primary steps:

      1. Zone Mapping: The function maps zones from the synthetic households/person data to their corresponding RSM zones using crosswalk data.

      2. Household Sampling:

        • If accessibility data is missing (first iteration) or if the RSM sampler is turned off, a default sampling rate is applied to all RSM zones, with optional 100% sampling in the study area.
        • If accessibility data is available and the RSM sampler is turned on, the function calculates differences in accessibility metrics between the current and previous iterations. The sampling rates are determined based on these differences and are adjusted to be within specified bounds. The RSM zones of the study area are sampled at a 100% sampling rate if the differential sampling indicator is turned on.
      3. Households and Persons Selection: The function selects households based on the calculated sampling rates. It also selects persons associated with the sampled households.

      4. Output:

        • The selected households and persons are written to output CSV files in the specified output directory.
        • The function also computes and logs the total sampling rate, representing the proportion of selected households relative to the total number of households.

      Note that in the current RSM deployment, sampler is set to use 25% default sampling rate. The intelligent sampler needs further testing to be used to sample households using the accessibility change.

      "},{"location":"development.html#intelligent-assembler","title":"Intelligent Assembler","text":"

      The intelligent assembler module assembles the trips of RSM model run and scale them appropriately based on the sampling rate of the RSM zones. The main inputs to this module are joint and individual trips from the donor and RSM model, households file, crosswalks for mapping zones, optional study area file and a flag to running the assembler.

      The assembler function follows these primary steps:

      1. Load Trip Files: The function reads the individual and joint trip data for the RSM run. If the assembler is set to run (flag run_assembler equals 1), the function also loads the corresponding trip data from the donor model run.

      2. Assemble Trips: It converts individual and joint trip data from both the RSM run and the original model run into a common table format using a merging process. It separates trips made by households in the RSM run and those that were not resimulated. Then, it combines these trips to create the final assembled trip data, including individual and joint trips.

      3. Evaluation of Trip Changes: The function calculates and evaluates the percentage change in total trips by mode for each home zone. It aggregates trips made by households in the RSM and original model runs and compares their trip counts by mode. This information is used to assess the stability of travel behavior in different zones.

      4. Alternative Behavior (If Assembler is Off): If the assembler is turned off (flag run_assembler equals 0), the function scales the RSM individual and joint trips based on the specified default sampling rate. This alternative behavior is intended to simulate all trips as if they were selected, eliminating the need for the assembler. If the study area file is present and the differential sampling is turned on(use.differential.sampling equals to 1), then the trips made by residents of the study area are not scaled based on the RSM deafult sampling rate.

      5. Outputs: The function returns two outputs: individual trips containing the assembled individual trip data, and joint trips containing the assembled joint trip data. These data files are structured to align with the format required for further analysis or use by Java components.

      In summary, the RSM assembler module takes multiple trip datasets and assembles them to create a unified dataset for further analysis, accommodating cases where only a subset of households were resimulated. The function also evaluates changes in trip behavior across different zones.

      "},{"location":"development.html#user-experience","title":"User Experience","text":"

      The RSM repurposes the ABM2+ Emme-based GUI. The options will be updated to reflect the RSM options, as will the input file locations and other parameters. The RSM user experience will, therefore, be nearly the same as the ABM2+ user experience.

      "},{"location":"userguide.html","title":"User Guide","text":""},{"location":"userguide.html#rsm-setup","title":"RSM Setup","text":"

      Below are the steps to setup an RSM scenario run:

      1. Set up an ABM run on the server\u2019s C drive* by using the ABM2+ release 14.2.2 scenario creation GUI located at T:\\ABM\\release\\ABM\\version_14_2_2\\dist\\createStudyAndScenario.exe.

        *running the model on the T drive and setting it to run on the local drive causes an error. An issue has been created on GitHub

      2. Open Anaconda Prompt and type the following command:

        python T:\\projects\\RSM\\setup\\setup_rsm.py [MODEL_RUN_DIRECTORY]

        Specifying the model run directory in the command line is optional. If it is not specified a dialog box will open asking the user to specify the model run.

      3. Change the inputs and properties as needed. Be sure to check the following:

        1. If running a new network, make sure the network files are correct
        2. Check that the RSM properties were appended to the property file and make sure the RSM properties are correct
        3. Check that the updated Tour Mode Choice UEC was copied over
      4. After opening Emme using start_emme_with_virtual_env.bat and opening the SANDAG toolbox in Modeller as usual, set the steps to skip all of the special market models and to run only 2 iterations. Most of these should be set automatically, though you may need to set it to skip the EE model manually.

        Figure 1: Steps to run in SANDAG model GUI for RSM run

      "},{"location":"userguide.html#debugging","title":"Debugging","text":"

      For crashes encountered in CT-RAMP, review the event log as usual. However, if it occurs during an RSM step, a new logfile called rsm-logging.log is created in the LogFiles folder.

      "},{"location":"userguide.html#rsm-specific-changes","title":"RSM Specific Changes","text":""},{"location":"userguide.html#application","title":"Application","text":"
      • sandag_abm.jar
        • New CT-RAMP jar file with few required Java code updates.
      "},{"location":"userguide.html#bin","title":"Bin","text":"
      • runRSMAccessibility.cmd
        • Runs CT-RAMP to compute the accessibility of each zone
      • runRSMAssembler.cmd
        • Runs the intelligent assembler
      • runRSMEmmebankMatrixAggregator.cmd
        • Opens the Emmebank of the donor model and aggregates the truck and external trip tables
      • runRSMInputAggregator.cmd
        • Aggregates various model inputs into the aggregated zone system creating
      • runRSMSampler.cmd
        • Runs the intelligent sampler that combines the donor model trip diaries with the travel behavior of the resampled households
      • runRSMSandagABM.cmd
        • Runs CT-RAMP on the sampled households
      • runRSMSandagABMTripTables.cmd
        • Builds trip tables from assembled trip data
      • runRSMSetProperty.cmd
        • Updates property file to read the accessibility file instead of building it
      • runRSMSetupUpdate.cmd
        • Updates several properties
      • runRSMTripMatrixAggregator.cm
        • Aggregates trip tables stored in OMX files from donor model
      • runRSMZoneAggregator.cmd
        • Runs the zone aggregator
      "},{"location":"userguide.html#emme_project","title":"Emme_project","text":"
      • start_emme_with_virtualenv.bat
        • New lines to call Python environments used in RSM scripts
      • scripts\\sandag_toolbox.mtbx
        • Updated toolbox with a master run script to call RSM steps
      "},{"location":"userguide.html#input","title":"Input","text":"
      • MGRASHAPE.zip
        • Zipped shapefile of the MGRAs (used in zone aggregator)
      "},{"location":"userguide.html#pythonemmetoolbox","title":"Python\\emme\\toolbox","text":"
      • master_run.py
        • Changed to include new model steps
      • import\\import_auto_demand.py
        • Changes to how the trip tables are read into the Emmebank
      • utilities\\databank_aggregator.py
        • Aggregates matrices stored in the Emmebank
      "},{"location":"userguide.html#new-properties","title":"New Properties","text":"
      • run.rsm.setup
        • Set to 1 if running the RSM setup steps and 0 otherwise
          • Zone aggregator
          • Input aggregator
          • Matrix aggregator
          • Emmebank aggregator
      • run.rsm
        • Set to 1 if running the RSM and 0 otherwise
      • run.rsm.zone.aggregator
        • If set to 1, the zone aggregator will be run. If set to 0, the zone system from a run specified in rsm.baseline.run.dir will be used.
      • rsm.baseline.run.dir
        • Baseline run to read in zone system from if not running zone aggregator
      • rsm.zones
        • Number of zones to use
      • External.zones
        • Number of external zones
      • Run.rsm.sampling
        • 1 if running the intelligent sampler and 0 if not. If set to 0, every zone will have the default sampling rate
      • Rsm.default.sampling.rate
        • Default sampling rate to use when running the intelligent sampler
      • Rsm.centroid.connector.start.id
        • Starting value of tcovid for new zonal connectors to aggregated zones
      • Full.modelrun.dir
        • Filepath of donor model
      • Taz.to.cluster.crosswalk.file
        • Maps TAZs to aggregated zones
      • Mgra.to.cluster.crosswalk.file
        • Maps MGRAs to aggregated zones
      • Cluster.zone.centroid.file
        • Latitude and longitude coordinates of aggregated zone centroids
      "},{"location":"userguide.html#new-files","title":"New Files","text":"
      1. study_area.csv:

        This optional file specifies an explicit definition of how to aggregate certain zones, and consequentially, which zones to not aggregate. This is useful for project-level analysis as a modeler may want higher resolution close to a project but not be need the resolution further away. The file has two columns, taz and group. The taz column is the zone ID in the ABM zone system, and the group column indicates what RSM zone the ABM zone will be a part of. This will be the MGRA ID, and the TAZ ID being the MGRA ID added to the number of external zones. If a user doesn\u2019t want to aggregate any zones within the study area, the group ID should be distinct for all of them. Presently, all RSM zones defined in the study area are sampled at 100%, and the remaining zones are sampled at the sampling rate set in the property file.

        Any zones not within the study area will be aggregated using the standard RSM zone aggregating algorithm.

        An example of how the study area file works is shown below (assuming 12 external zones):

        Figure 2: ABM Zones

        Table 1: study_area.csv

        taz group 1 1 2 2 3 3 4 4 5 5 6 6

        Figure 3: Resulting RSM Zones

        For a practical example, see Figure 4, where a study area was defined as every zone within a half mile of a project. Note that within the study area, no zones were aggregated (as it was defined), but outside of the study area, aggregation occurred.

        Figure 4: Example Study Area

      "},{"location":"visualizer.html","title":"Visualizer","text":""},{"location":"visualizer.html#introduction","title":"Introduction","text":"

      The team developed a RSM visualizer tool to allow user to summarize and compare metrics from multiple RSM model runs. It is a dashboard style tool built using SimWrapper (an open source web-based data visualization tool for building disaggregate transportation simulations) and also leverages SANDAG\u2019s Data Pipeline Tool. SimWrapper software works by creating a mini file server to host reduced data summaries of travel model. The dashboard is created via YAML files, which can be customized to automate interactive report summaries, such as charts, summary tables, and spatial maps.

      "},{"location":"visualizer.html#design","title":"Design","text":"

      Visualizer has three main components:

      • Data Pipeline
      • Post Processing
      • SimWrapper Dashboard
      "},{"location":"visualizer.html#data-pipeline","title":"Data Pipeline","text":"

      SANDAG Data Pipeline Tool aims to aid in the process of building data pipelines that ingest, transform, and summarize data by taking advantage of the parameterization of data pipelines. Rather than coding from scratch, configure a few files and the tool will figure out the rest. Using pipeline helps to get the desired model summaries in a csv format. See here to learn how the tool works. Note that RSM visualizer currently supports a fixed set of summaries from the model and additional summaries can be easily incorporated into the pipeline by modifying the settings, processor and expression files.

      "},{"location":"visualizer.html#post-processing","title":"Post Processing","text":"

      Next, there is a post-processing script to perform all the data manipulations which are done outside of the data pipeline tool to prepare the data in the format required by SimWrapper. Similar to data pipeline, user can also modify this post-processing script to add any new summaries in order to bring them into the SimWrapper dashboard in order to use them in Simwrapper.

      "},{"location":"visualizer.html#simwrapper","title":"SimWrapper","text":"

      Lastly, the created summary files are consumed by SimWrapper to generate dashboard. SimWrapper is a web platform that can display either individual full-page data visualizations, or collections of visualizations in \u201cdashboard\u201d format. It expects your simulation outputs to just be local files on your filesystem somewhere; there is no need to upload the summary files to centralized database or cloud server to create the dashboard.

      For setting up the visualization in SimWrapper, configuration files (in YAML format) are created that provide all the config details to get it up and running, such as which data to load, how to lay out the dashboard, what type of chart to create etc. Refer to SimWrapper documentation here to get more familiar with it.

      "},{"location":"visualizer.html#setup","title":"Setup","text":"

      The visualizer is currently deployed to compare 3 scenario runs at once. Running data pipeline and post-processing for each of those scenario is controlled thorugh the process_scenarios python script and configuration for scenarios are specified using the scenarios.yaml file. User will need to modify this yaml file to specify the scenarios they would like to compare using visualizer. There are two categories of scenarios to be specified - RSM and ABM (Donor Model) runs. For each of the scenario run, specify the directory of input and report folders in this configuration file. Files from input and report folder for the scenarios are then used in the data pipeline tool and post-processing step to create summaries in the processed folder of SimWrapper directory. Note that additional number of scenarios can be compared by extending the configuration in this file yaml file.

      "},{"location":"visualizer.html#visualization","title":"Visualization","text":"

      Currently there are five default visualization summaries in the visualizer:

      "},{"location":"visualizer.html#bar-charts","title":"Bar Charts","text":"

      These charts are for comparing VMT, mode shares, transit boardings and trip purpose by time-of-day distribution. Here is a snapshot of sample YAML configuration file for bar chart:

      User can add as many charts as you want to the layout. For each chart, you should specify a csv file for the summaries and columns should match the csv file column name. There are also other specifications for the bar charts which you learn more about here.

      Here is how the how the visual looks in the dashboard:

      "},{"location":"visualizer.html#network-flows","title":"Network Flows","text":"

      These charts are for comparing flows and VMT on the network. You can compare any two scenarios on one network. Here is a snapshot of the configuration file:

      For each network you need the csv files for two scenario summaries and an underlying network file which should be in geojson format. The supporting script creates the geojson files from the model outputs for the SimWrapper. For more info on network visualization specification see here.

      Here is how the how the visual looks in the dashboard:

      "},{"location":"visualizer.html#sample-rate-map","title":"Sample Rate Map","text":"

      This visual is a map for showing the RSM sample rates for each zone. Here is a snapshot of the configuration [file]:

      For each map you need a csv file of sample rates and the map of zones in .shp format. For more info on network visualization specification see here.

      Here is how the how the visual looks in the dashboard:

      "},{"location":"visualizer.html#zero-car-map","title":"Zero Car Map","text":"

      This visual is a map for showing the zero-car household distribution. Here is a snapshot of the configuration file:

      For each map you need a csv file of household rates and the map of zones in .shp format. For more info on network visualization specification see here

      Here is how the how the visual looks in the dashboard:

      "},{"location":"visualizer.html#od-flows","title":"OD Flows","text":"

      This chart is for showing OD trip flows. Here is a snapshot of the configuration file:

      For each map you need a csv file of od trip flows and the map of zones in .shp format. For more info on network visualization specification see here

      Here is how the how the visual looks in the dashboard:

      You can also modify the data and configuration of each visual on SimWrapper server. For each visual, there is a configuration button (see below), where you can add data, and modify all the map configurations. You can also export these configurations into a YAML file so you can use it in future.

      "},{"location":"visualizer.html#how-to-run","title":"How to Run","text":"

      The first step to run the visualizer is to bring in the scenario files. Currently the visualizer is setup to compare three scenarios: donor_ model, rsm_base and rsm_scen. donor_model is the ABM run, rsm_base is the baseline (no-action) RSM run and rsm_scen is the project (action) RSM run.

      • For each of the three scenarios, copy report folder from their respective scenario run to \u201cvisualizer/simwrapper/data/external/[scenario_name]/reports\u201d folder. For instance, for donor_model copy the report folder here.

      • Only for the RSM scenarios, copy mgra_crosswalk.csv and households.csv files from the scenario input folder and bring them to the input folder \u201cvisualizer/simwrapper/data/external/[scenario_name]/input\u201d. Next, change the name of the \u201chouseholds.csv\u201d to \u201chouseholds_orig.csv\u201d. At this point the input folder for RSM scenarios in the simwrapper folder should look like below:

      As mentioned earlier, if you wish to add any more RSM scenarios for comaprison, you can do it by modifying the scenarios.yaml file. Simply add the scenario configuration by copying the rsm_scen section and paste it under and change \u201crsm_scen\u201d to that new scenario name. Note that you will also need to add that another scenario config to the Data Pipeline and Post-Processing step.

      Once you have copied required scenario files and the configuration setup, you are ready to runt the visualizer.

      • Open Anaconda prompt and change the directory to visualizer folder in your local RSM repository.

      • Run the process scenario script by typing command below and then press enter.

        python process_scenarios.py

      • Processing all the scenario using pipeline will take some time.

      • Once this script is run successfully, it creates the summary files for each scenario to feed into simwrapper.

      • Finally, open this link in the web browser - https://simwrapper.github.io/site/

      • Click on \u2018Enter Site\u2019 button, then click on \u2018add local folder\u2019 and add simwrapper directory (visualizer\\simwrapper) to run the SimWrapper Visualizer for RSM.

      "}]} \ No newline at end of file diff --git a/sitemap.xml.gz b/sitemap.xml.gz index df4d219deda969641b3a40a2cbd60166eb19b110..0f890aa6ecc38b9aeb70ea930deeb053673551bb 100644 GIT binary patch delta 15 WcmZo-YGPuS@8;m><^MR5{Wky~as=uC delta 15 WcmZo-YGPuS@8;k*d+WnQ_TK;`