Ready for review (I think)

gieljnssns · Apr 18, 2024 · 1590404 · 1590404
1 parent c51d540
commit 1590404
Show file tree

Hide file tree

Showing 4 changed files with 133 additions and 44 deletions.
diff --git a/docs/mlregressor.md b/docs/mlregressor.md
@@ -8,6 +8,7 @@ This API provides two main methods:
 
 - predict: To obtain a prediction from a pre-trained model. This method is exposed with the `regressor-model-predict` end point.
 
+
 ## A basic model fit
 
 To train a model use the `regressor-model-fit` end point.
@@ -45,28 +46,38 @@ A correct `curl` call to launch a model fit can look like this:
 ```
 curl -i -H "Content-Type:application/json" -X POST -d '{}' http://localhost:5000/action/regressor-model-fit
 ```
-
-After applying the `curl` command to fit the model the following information is logged by EMHASS:
-
-    2023-02-20 22:05:22,658 - __main__ - INFO - Training a LinearRegression model
-    2023-02-20 22:05:23,882 - __main__ - INFO - Elapsed time: 1.2236599922180176
-    2023-02-20 22:05:24,612 - __main__ - INFO - Prediction R2 score: 0.2654560762747957
-
-## The predict method
-
-To obtain a prediction using a previously trained model use the `regressor-model-predict` end point.
+A Home Assistant `rest_command` can look like this:
 
 ```
-curl -i -H "Content-Type:application/json" -X POST -d '{}' http://localhost:5000/action/regressor-model-predict
+fit_heating_hours:
+  url: http://127.0.0.1:5000/action/regressor-model-fit
+  method: POST
+  content_type: "application/json"
+  payload: >-
+    {
+    "csv_file": "heating_prediction.csv",
+    "features":["degreeday", "solar"],
+    "target": "hours",
+    "regression_model": "RandomForestRegression",
+    "model_type": "heating_hours_degreeday",
+    "timestamp": "timestamp",
+    "date_features": ["month", "day_of_week"]
+    }
 ```
+After fitting the model the following information is logged by EMHASS:
 
-If needed pass the correct `model_type` like this:
+    2024-04-17 12:41:50,019 - web_server - INFO - Passed runtime parameters: {'csv_file': 'heating_prediction.csv', 'features': ['degreeday', 'solar'], 'target': 'heating_hours', 'regression_model': 'RandomForestRegression', 'model_type': 'heating_hours_degreeday', 'timestamp': 'timestamp', 'date_features': ['month', 'day_of_week']}
+    2024-04-17 12:41:50,020 - web_server - INFO -  >> Setting input data dict
+    2024-04-17 12:41:50,021 - web_server - INFO - Setting up needed data
+    2024-04-17 12:41:50,048 - web_server - INFO -  >> Performing a machine learning regressor fit...
+    2024-04-17 12:41:50,049 - web_server - INFO - Performing a MLRegressor fit for heating_hours_degreeday
+    2024-04-17 12:41:50,064 - web_server - INFO - Training a RandomForestRegression model
+    2024-04-17 12:41:57,852 - web_server - INFO - Elapsed time for model fit: 7.78800106048584
+    2024-04-17 12:41:57,862 - web_server - INFO - Prediction R2 score of fitted model on test data: -0.5667567505914477
 
-```
-curl -i -H "Content-Type:application/json" -X POST -d '{"model_type": "load_forecast"}' http://localhost:5000/action/regressor-model-predict
-```
+## The predict method
 
-It is possible to publish the predict method results to a Home Assistant sensor.
+To obtain a prediction using a previously trained model use the `regressor-model-predict` end point.
 
 The list of parameters needed to set the data publish task is:
 
@@ -89,3 +100,66 @@ runtimeparams = {
     "model_type": "heating_hours_degreeday"
 }
 ```
+
+Pass the correct `model_type` like this:
+
+```
+curl -i -H "Content-Type:application/json" -X POST -d '{"model_type": "heating_hours_degreeday"}' http://localhost:5000/action/regressor-model-predict
+```
+
+A Home Assistant `rest_command` can look like this:
+
+```
+predict_heating_hours:
+  url: http://localhost:5001/action/regressor-model-predict
+  method: POST
+  content_type: "application/json"
+  payload: >-
+   {
+    "mlr_predict_entity_id": "sensor.predicted_hours",
+    "mlr_predict_unit_of_measurement": "h",
+    "mlr_predict_friendly_name": "Predicted hours",
+    "new_values": [8.2, 7.23, 2, 6],
+    "model_type": "heating_hours_degreeday"
+    }
+```
+After predicting the model the following information is logged by EMHASS:
+
+```
+2024-04-17 14:25:40,695 - web_server - INFO - Passed runtime parameters: {'mlr_predict_entity_id': 'sensor.predicted_hours', 'mlr_predict_unit_of_measurement': 'h', 'mlr_predict_friendly_name': 'Predicted hours', 'new_values': [8.2, 7.23, 2, 6], 'model_type': 'heating_hours_degreeday'}
+2024-04-17 14:25:40,696 - web_server - INFO -  >> Setting input data dict
+2024-04-17 14:25:40,696 - web_server - INFO - Setting up needed data
+2024-04-17 14:25:40,700 - web_server - INFO -  >> Performing a machine learning regressor predict...
+2024-04-17 14:25:40,715 - web_server - INFO - Performing a prediction for heating_hours_degreeday
+2024-04-17 14:25:40,750 - web_server - INFO - Successfully posted to sensor.predicted_hours = 3.716600000000001
+```
+The predict method will publish the result to a Home Assistant sensor.
+
+
+## How to store data in a csv file from Home Assistant
+Notify to a file
+```
+notify:
+  - platform: file
+    name: heating_hours_prediction
+    timestamp: false
+    filename: /share/heating_prediction.csv
+```
+Then you need an automation to notify to this file
+```
+alias: "Heating csv"
+id: 157b1d57-73d9-4f39-82c6-13ce0cf42
+trigger:
+  - platform: time
+    at: "23:59:32"
+action:
+  - service: notify.heating_hours_prediction
+    data:
+      message: >
+        {% set degreeday = states('sensor.degree_day_daily') |float %}
+        {% set heating_hours = states('sensor.heating_hours_today') |float | round(2) %}
+        {% set solar = states('sensor.solar_daily') |float | round(3) %}
+        {% set time = now() %}
+
+          {{time}},{{degreeday}},{{solar}},{{heating_hours}}
+```
diff --git a/src/emhass/command_line.py b/src/emhass/command_line.py
@@ -246,34 +246,39 @@ def set_input_data_dict(
         P_PV_forecast, P_load_forecast = None, None
         params = json.loads(params)
         days_list = None
-        csv_file = params["passed_data"]["csv_file"]
-        features = params["passed_data"]["features"]
-        target = params["passed_data"]["target"]
-        timestamp = params["passed_data"]["timestamp"]
-        if get_data_from_file:
-            base_path = base_path + "/data"
-            filename_path = pathlib.Path(base_path) / csv_file
+        csv_file = params["passed_data"].get("csv_file", None)
+        if "features" in params["passed_data"]:
+            features = params["passed_data"]["features"]
+        if "target" in params["passed_data"]:
+            target = params["passed_data"]["target"]
+        if "timestamp" in params["passed_data"]:
+            timestamp = params["passed_data"]["timestamp"]
+        if csv_file:
+            if get_data_from_file:
+                base_path = base_path + "/data"
+                filename_path = pathlib.Path(base_path) / csv_file
 
-        else:
-            filename_path = pathlib.Path(base_path) / csv_file
+            else:
+                filename_path = pathlib.Path(base_path) / csv_file
 
-        if filename_path.is_file():
-            df_input_data = pd.read_csv(filename_path, parse_dates=True)
+            if filename_path.is_file():
+                df_input_data = pd.read_csv(filename_path, parse_dates=True)
 
-        else:
-            logger.error("The cvs file was not found.")
-            raise ValueError("The CSV file " + csv_file + " was not found.")
-        required_columns = []
-        required_columns.extend(features)
-        required_columns.append(target)
-        if timestamp is not None:
-            required_columns.append(timestamp)
+            else:
+                logger.error("The cvs file was not found.")
+                raise ValueError("The CSV file " + csv_file + " was not found.")
+            required_columns = []
+            required_columns.extend(features)
+            required_columns.append(target)
+            if timestamp is not None:
+                required_columns.append(timestamp)
 
-        if not set(required_columns).issubset(df_input_data.columns):
-            logger.error("The cvs file does not contain the required columns.")
-            raise ValueError(
-                f"CSV file should contain the following columns: {', '.join(required_columns)}",
-            )
+            if not set(required_columns).issubset(df_input_data.columns):
+                logger.error("The cvs file does not contain the required columns.")
+                msg = f"CSV file should contain the following columns: {', '.join(required_columns)}"
+                raise ValueError(
+                    msg,
+                )
 
     elif set_type == "publish-data":
         df_input_data, df_input_data_dayahead = None, None

diff --git a/src/emhass/machine_learning_regressor.py b/src/emhass/machine_learning_regressor.py
@@ -190,9 +190,10 @@ def get_regression_model(self: MLRegressor) -> tuple[str, str]:
             param_grid = REGRESSION_METHODS["AdaBoostRegression"]["param_grid"]
         else:
             self.logger.error(
-                "Passed sklearn model %s is not valid",
+                "Passed model %s is not valid",
                 self.regression_model,
             )
+            return None
         return base_model, param_grid
 
     def fit(self: MLRegressor, date_features: list | None = None) -> None:

diff --git a/src/emhass/utils.py b/src/emhass/utils.py
@@ -228,12 +228,12 @@ def treat_runtimeparams(
             params["passed_data"]["csv_file"] = csv_file
             params["passed_data"]["features"] = features
             params["passed_data"]["target"] = target
-            if "timestamp" not in runtimeparams.keys():
+            if "timestamp" not in runtimeparams:
                 params["passed_data"]["timestamp"] = None
             else:
                 timestamp = runtimeparams["timestamp"]
                 params["passed_data"]["timestamp"] = timestamp
-            if "date_features" not in runtimeparams.keys():
+            if "date_features" not in runtimeparams:
                 params["passed_data"]["date_features"] = []
             else:
                 date_features = runtimeparams["date_features"]
@@ -242,6 +242,15 @@ def treat_runtimeparams(
         if set_type == "regressor-model-predict":
             new_values = runtimeparams["new_values"]
             params["passed_data"]["new_values"] = new_values
+            if "csv_file" in runtimeparams:
+                csv_file = runtimeparams["csv_file"]
+                params["passed_data"]["csv_file"] = csv_file
+            if "features" in runtimeparams:
+                features = runtimeparams["features"]
+                params["passed_data"]["features"] = features
+            if "target" in runtimeparams:
+                target = runtimeparams["target"]
+                params["passed_data"]["target"] = target
 
         # Treating special data passed for MPC control case
         if set_type == "naive-mpc-optim":
@@ -335,7 +344,7 @@ def treat_runtimeparams(
             sklearn_model = runtimeparams["sklearn_model"]
         params["passed_data"]["sklearn_model"] = sklearn_model
         if "regression_model" not in runtimeparams.keys():
-            regression_model = "LinearRegression"
+            regression_model = "AdaBoostRegression"
         else:
             regression_model = runtimeparams["regression_model"]
         params["passed_data"]["regression_model"] = regression_model