Skip to content

Commit

Permalink
new: add new plants (contd. from !26) (et/somenergia-jardiner!29)
Browse files Browse the repository at this point in the history
* chore: bump format and add context

* change: remove unused seed


* dev: add .prettierrc.yaml config file

* change: update outdated metric value in WHERE clause

* new: add dbt tests for raw data from dset

* chore: minor formatting

* change: set snapshot target schema to lake

* change: add type casting to raw model

* dev: add .editorconfig

* change: remove outdated columns and minor formatting

* change: refactor snapshots

- set source name to airbyte for schema airbyte_imported instead of plantlake, since it was confusing
- rename models to signal_denormalized instead of signal__denormalized (double underscore) because it was confusing
- change destination schema to dbt_snapshots as per dbt recommendation to not mix dbt models and snapshots in the same schema
- create snapshot for signal_denormalized

* change: minor housekeeping

* chore:add some source docs

* fix: metrics have been renamed, fragile pivoting

* finish plant_uuid support, plant_id is spreaded like a virus, we'll leave it until we deprecate plantmonitor

* fix: propagate plant_uuid, start replacing plant_id by plant_plantmonitor_id

* dev: update .gitignore with duckdb and dbt internals

* new: add snapshot as source for signal__normalized

also replace signal_normalized for signal__normalized for consistency

* fix: migrate references from seed to snapshot

* change: migrate seed associating signals and devices to a table

this changes the models currently pointing to a seed and points them to a sql table instead. it also creates the source definition in yaml files and ports the descriptions from the previous seed.

* dev: merge duplicate keys in .sqlfluff
  • Loading branch information
diegoquintanav committed Nov 20, 2023
1 parent 3be1953 commit 72d49d7
Show file tree
Hide file tree
Showing 29 changed files with 286 additions and 866 deletions.
18 changes: 18 additions & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# EditorConfig is awesome: https://EditorConfig.org

# top-most EditorConfig file
root = true

[*]
indent_style = space
indent_size = 4
end_of_line = lf
charset = utf-8
trim_trailing_whitespace = false
insert_final_newline = true

[*.sql]
indent_size = 2

[*.yaml, *.yml]
indent_size = 2
8 changes: 7 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# handmade ignores

# duckdb
*.duckdb

# dbt internals
dbt_jardiner/pre/

# tmux shenanigans
tmux_jardiner
..lock
Expand Down Expand Up @@ -230,4 +236,4 @@ fabric.properties
.idea/httpRequests

# Android studio 3.1+ serialized cache file
.idea/caches/build_file_checksums.ser
.idea/caches/build_file_checksums.ser
10 changes: 10 additions & 0 deletions .prettierrc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
semi: false
overrides:
- files: "*.js"
options:
semi: true
- files:
- "*.html"
options:
tabWidth: 2
6 changes: 2 additions & 4 deletions .sqlfluff
Original file line number Diff line number Diff line change
@@ -1,13 +1,11 @@
[sqlfluff]
exclude_rules = layout.indent, layout.cte_bracket, layout.select_targets
exclude_rules = layout.indent, layout.cte_bracket, layout.select_targets, LT05
# set max_line_length to whatever you set in sqlfmt
max_line_length = 120
# don't check for line length, let sqlfmt do that
exclude_rules = LT05

[sqlfluff:rules]
capitalisation_policy = lower
extended_capitalisation_policy = lower

[sqlfluff:rules:convention.terminator]
multiline_newline = True
multiline_newline = True
Original file line number Diff line number Diff line change
Expand Up @@ -4,22 +4,35 @@
We therefore set it to 0 between midnight and 3 since all plants are fotovoltaic #}
{# TODO we should pass the inverter_energy to incremental instead of accumulative
since this is ugly as fuck #}

{#
This model is followed by a pivot which wants to distinguish between meter_exported_energy and inverter_exported_energy

case when the content of the google sheets is fragile. We should limit the amount of metrics
to what we need to uniformize at the dropdown of
[sheet](https://docs.google.com/spreadsheets/d/1ybUXREO8cMaLMlV4Kt2iYyoNg2msirnbTTiYL2PBY2M).

#}
with
dset_key_metrics as (
select date_trunc('hour', ts) as start_hour, plant, device_type, metric as split_metric, signal_value
from {{ ref("int_dset_responses__values_incremental") }}
where metric in ('inverter_energy', 'irradiance', 'exported_energy')
where metric in ('energia_activa_exportada', 'irradiancia')
)
select
start_hour,
plant,
case when split_metric = 'irradiance' then 'irradiation' else split_metric end as metric,
case
when split_metric = 'inverter_energy' and device_type = 'inverter'
when split_metric = 'irradiancia' then 'irradiation' {# from W/m^2 to Wh/m^2 because split_metric has hourly granularity #}
when split_metric = 'energia_activa_exportada' then device_type || '_exported_energy'
else split_metric
end as metric,
case
when split_metric = 'energia_activa_exportada' and device_type = 'inverter'
then (extract(hour from start_hour) > 3)::integer * (max(signal_value) - min(signal_value)) {# we have random-ish resets before 3 #}
when split_metric = 'irradiance' and device_type in ('sensor', 'module', 'inverter')
when split_metric = 'irradiancia' and device_type in ('sensor', 'module', 'inverter')
then avg(signal_value)
when split_metric = 'exported_energy' and device_type = 'meter'
when split_metric = 'energia_activa_exportada' and device_type = 'meter'
then max(signal_value) - min(signal_value)
else null
end as metric_value
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@
e.g. signal_value * (case when signal_unit = 'kwh' then 1000 else 1)#}

SELECT
metadata.plant,
metadata.plant_uuid,
metadata.plant,
metadata.signal,
metadata.metric,
metadata.device,
Expand All @@ -27,5 +28,5 @@ SELECT
valors.queried_at,
valors.ts,
valors.signal_value as signal_value
FROM {{ ref('seed_signals__with_devices') }} AS metadata
LEFT JOIN {{ ref('int_dset_responses__deduplicated') }} AS valors USING(signal_uuid)
FROM {{ ref('raw_gestio_actius__signal_denormalized') }} AS metadata
LEFT JOIN {{ ref('int_dset_responses__deduplicated') }} AS valors USING(signal_uuid)
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{{ config(materialized="table") }}

select distinct plant, device, device_uuid, device_type, device_parent
from {{ ref("seed_signals__with_devices") }}
select distinct plant, plant_uuid, device, device_uuid, device_type, device_parent
from {{ ref('raw_gestio_actius__signal_denormalized') }}
order by plant, device
Original file line number Diff line number Diff line change
Expand Up @@ -23,19 +23,19 @@ with pot_instantanea_planta as (
)
select
p.plant_name as nom_planta,
--municipality as municipi,
province as provincia,
technology as tecnologia,
peak_power_kw as potencia_pic_kw,
nominal_power_kw as potencia_nominal_kw,
-- p.municipality as municipi,
p.province as provincia,
p.technology as tecnologia,
p.peak_power_kw as potencia_pic_kw,
p.nominal_power_kw as potencia_nominal_kw,
i.ultim_registre_pot_instantanea,
i.pot_instantantanea_planta_kw,
ir.ts as ultim_registre_irradiacio,
ir.signal_value as irradiacio,
ppd.dia,
ppd.energia_exportada_comptador_kwh,
ppd.energia_esperada_solargis_kwh
from {{ ref('seed_plants__parameters') }} p
from {{ ref('raw_gestio_actius_plant_parameters') }} p
left join pot_instantanea_planta i
on i.plant = p.plant_name
left join plant_production_daily_previous_day ppd on ppd.nom_planta = p.plant_name
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,18 @@
#}
with plants as (
select distinct plant from {{ ref('int_signal_device_relation__distinct_devices') }}
select distinct plant, plant_uuid from {{ ref('int_signal_device_relation__distinct_devices') }}
)
select
spine.start_hour,
plant_metadata.plant_id as plant_id,
plants.plant_uuid,
plants.plant as plant_name,
plant_metadata.plant_id as plantmonitor_plant_id,
plant_metadata.peak_power_kw::float as peak_power_kw,
plant_metadata.technology as technology,
dset.irradiation as dset_irradiation_wh,
dset.inverter_energy as dset_inverter_energy_kwh,
dset.exported_energy as dset_meter_instant_exported_energy_kwh,
dset.inverter_exported_energy as dset_inverter_energy_kwh,
dset.meter_exported_energy as dset_meter_instant_exported_energy_kwh,
NULL::integer as dset_meter_exported_energy_kwh,
NULL::integer as dset_meter_imported_energy_kwh,
forecast.forecastdate as forecast_date,
Expand All @@ -25,14 +26,14 @@ select
sr.energy_output_kwh as satellite_energy_output_kwh,
omie.price as omie_price_eur_mwh,
{# exported_energy should be in wh we pass it to kwh. Also the /1000 is GSTC[W/m2] #}
(dset.exported_energy*1000 / plant_metadata.peak_power_kw::float) / (NULLIF(sr.tilted_irradiation_wh_m2, 0.0) / 1000.0) as pr_hourly,
(dset.meter_exported_energy*1000 / plant_metadata.peak_power_kw::float) / (NULLIF(sr.tilted_irradiation_wh_m2, 0.0) / 1000.0) as pr_hourly,
spine.start_hour between solar_events.sunrise_real and solar_events.sunset_real as is_daylight_real,
spine.start_hour between solar_events.sunrise_generous and solar_events.sunset_generous as is_daylight_generous,
round(meter_registry.export_energy_wh/1000,2) as erp_meter_exported_energy_kwh,
round(meter_registry.import_energy_wh/1000,2) as erp_meter_imported_energy_kwh
from {{ ref('spine_hourly') }} as spine
left join plants ON TRUE
left join {{ ref('seed_plants__parameters') }} plant_metadata on plants.plant = plant_metadata.plant_name
left join {{ ref('raw_gestio_actius_plant_parameters') }} plant_metadata on plants.plant_uuid = plant_metadata.plant_uuid
left join {{ ref('int_dset_metrics_wide_hourly') }} dset using(start_hour, plant)
left join {{ ref('int_energy_forecasts__best_from_plantmonitordb') }} forecast using(start_hour, plant_id)
left join {{ ref('int_satellite_readings__hourly') }} sr
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ select
module_temperature_dc,
energy_output_kwh
from {{ ref('raw_solargis_satellite_readings__temp_and_pv_energy') }} sg
left join {{ref('seed_plants__parameters')}} p on p.plant_id = sg.plant_id
left join {{ref('raw_gestio_actius_plant_parameters')}} p on p.plant_id = sg.plant_id

-- SolarGis PVOUT (aquí photovoltaic_energy_output_wh) retorna l'energia en kwh però plantmonitor per error ho registra com a wh sense fer cap transformació.
-- Entenem que al redash s'està corregint a mà abans de mostrar el valor.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ with inverters_energy as (
signal_unit,
signal_value
from {{ ref('int_dset_responses__values_incremental') }}
where device_type in ('inverter') and metric = 'inverter_energy'
where device_type in ('inverter') and metric = 'energia_activa_exportada'
), production_hourly as (
select
start_hour,
Expand Down Expand Up @@ -46,4 +46,4 @@ select
plant as nom_planta,
device as aparell,
inverter_energy_MWh as energia_inversor_mwh
from production_monthly
from production_monthly
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,6 @@ select
signals.device_type,
signals.device_uuid,
coalesce(rebut_from_dset, false) as rebut_from_dset
from {{ ref("seed_signals__with_devices") }} as signals
from {{ ref('raw_gestio_actius__signal_denormalized') }} as signals
left join valors on signals.signal_uuid = valors.signal_uuid
order by plant, signal
6 changes: 3 additions & 3 deletions dbt_jardiner/models/jardiner/marts/dm_plants.sql
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{{ config(materialize='view') }}

select
plant as nom_planta,
plant_name as nom_planta,
municipality as municipi,
province as provincia,
latitude as latitud,
Expand All @@ -10,10 +10,10 @@ select
nominal_power_kw as potencia_nominal_kw,
technology as tecnologia,
connection_date as connexio,
owner as manteniment,
"owner" as manteniment,
n_strings_plant as "strings/planta",
n_modules_string as "moduls/string",
n_strings_inverter as "strings/inversor",
esquema_unifilar,
layout
from {{ ref('raw_gestio_actius_plant_parameters') }}
from {{ ref('raw_gestio_actius_plant_parameters') }}
Original file line number Diff line number Diff line change
@@ -1,8 +1,91 @@
version: 2

models:

- name: raw_gestio_actius_production_target
description: >
Conté la taula incremental que es crea mitjançant el snapshot (al schema lake)
que llegueix la source del drive importada per airbyte (al schema airbyte_imported)
que llegueix la source del drive importada per airbyte (al schema airbyte_imported)
- name: raw_gestio_actius__signal_denormalized
description: >
Taula amb la llista de devices, els seus uuids i si tenen un parent.
Conté la relació entre signal_description de l'API de DSET (un UUID
imputat per nosaltres en el document de maping que se'ls envia) amb un device_uuid
també imputat per nosaltres que es correspon amb la columna homònima de la taula
d'aparell que correspongui ja sigui inverter, string, sensor, plant, etc.
Aquesta taula equival al 'seed' data/seed_signals__with_devices.csv que es carrega
a la taula utilitzant l'script scripts/file_to_sql.py.
Exemple del maping enviat per [Llanillos](https://docs.google.com/spreadsheets/d/1op_WHvGZNyDdkBD7EOXK-CPU5-aGjK3r/edit#gid=629299947)
columns:
- name: signal_id
description: id intern de la db
- name: plant_uuid
description: uuid4 de la planta
- name: plant
description: nom de la planta
- name: signal
description: nom del senyal
- name: metric
description: nom de la metrica
tests:
- accepted_values:
config:
severity: error
error_if: ">0"
values:
- "comunicacio_ok"
- "energia_activa_exportada"
- "energia_activa_importada"
- "energia_reactiva_q1"
- "energia_reactiva_q2"
- "energia_reactiva_q3"
- "energia_reactiva_q4"
- "frecuencia"
- "intensitat_bt_fase_r"
- "intensitat_bt_fase_s"
- "intensitat_bt_fase_t"
- "intensitat_dc"
- "irradiancia"
- "potencia_activa"
- "potencia_activa_fase_r"
- "potencia_activa_fase_s"
- "potencia_activa_fase_t"
- "temperatura_ambient"
- "temperatura_dispositiu"
- "temperatura_pv_modul"
- "temperatura_superficie"
- "voltatge_bt_fase_r"
- "voltatge_bt_fase_s"
- "voltatge_bt_fase_t"
- "voltatge_dc"
- "voltatge_mt_fase_r"
- "voltatge_mt_fase_s"
- "voltatge_mt_fase_t"
- name: device
description: nom del device
- name: device_type
description: tipus de device
tests:
- accepted_values:
values:
["meter", "inverter", "sensor", "plant", "string", "module"]
- name: device_parent
description: device parent del device. e.g. un string té l'inversor1 com a pare
- name: signal_uuid
description:
UUID del senyal present a la columns signal_UUID del maping. Es assignada manualment amb
cada nou mapeig de GA.
tests:
- unique
- name: device_uuid
description:
Imputat per nosaltres que es correspon amb la columna homònima de la
taula d'aparell que correspongui ja sigui inverter, string, sensor, plant, etc.
tests:
- dbt_utils.unique_combination_of_columns:
config:
severity: error
combination_of_columns:
- plant_uuid
- device_uuid
- signal_uuid
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
{{ config(materialized='view') }}

WITH fresh AS (
SELECT *
FROM {{ ref("snapshot_signal_denormalized") }} sn
WHERE sn.dbt_valid_to IS NULL
)
SELECT
plant_uuid::uuid,
plant::text,
signal::text,
metric::text,
device::text,
device_type::text,
device_uuid::uuid,
device_parent::text,
signal_uuid::uuid,
inserted_at::timestamptz,
updated_at::timestamptz
FROM fresh
Loading

0 comments on commit 72d49d7

Please sign in to comment.