US State Wise Wages Analysis using Azure Machine Learning

Azure-ML-US-Wage-Regression-Analysis

US State Wise Wages Analysis using Azure Machine Learning

Overview of the Study

This project performs the regression analysis of the Wages of the employees all around US considering different features like the employee industry, area, state, ... etc to find out how the wages are being affected with different features. Which are most critical factors contributing to the variation in wages are also studied.

The entire study is done in the cloud environment utilizing the "Azure ML studio" and the models are deploiyment in the cloud environment itself.

This project was inspired by recent concerns and changes pertaining to employment in the United States along with its impact in business intelligence.

Data Assets

The data is collected from the "US Beaureau of Labor Statistics" sourced from the "State and Metro Area Employment" and "Hours and Earnings Data". More such information about the data can be found by navigating to the following URL. https://learn.microsoft.com/en-us/azure/open-datasets/dataset-us-state-employment-earnings?tabs=azure-storage#data-access

The dataset can also be obtained from the azure ml opne datasets. There are around 64 lakh+ records in the data .

Feature Engineering Analysis

We analyze the importance of different features in our models to understand which factors contribute most to customer churn.

Variable Importances for Random Forest and LightGBM:

ifrom sklearn.pipeline import FeatureUnion
  
  column_group_1 = ['state_code', 'data_type_code', 'supersector_code', 'period', 'footnote_codes', 'supersector_name', 'data_type_text', 'state_name']
  
  column_group_2 = ['seasonal']
  
  column_group_0 = ['area_code', 'industry_code', 'industry_name', 'area_name']
  
  column_group_3 = [['year']]
  
  feature_union = FeatureUnion([
      ('mapper_0', get_mapper_0(column_group_0)),
      ('mapper_1', get_mapper_1(column_group_1)),
      ('mapper_2', get_mapper_2(column_group_2)),
      ('mapper_3', get_mapper_3(column_group_3)),
  ])
  return feature_union

Model Performance Metrics

Regression Metrics The following are some of the regression metrics have been used in the study:
1. Variance
2. Mean Absolute Percentage Error
3. Mean Absulute Error
4. Normalized Mean Absolute Error
5. R2_Score
6. Root Mean Squared Error
7. Normalized Root Mean Sqaured Error ... etc.
Code Snippet for Metrics Analysis

from azureml.training.tabular.preprocessing._dataset_binning import make_dataset_bins
  from azureml.training.tabular.score.scoring import score_regression
  
  y_pred = model.predict(X_test)
  y_min = np.min(y)
  y_max = np.max(y)
  y_std = np.std(y)
  
  bin_info = make_dataset_bins(X_test.shape[0], y_test)
  metrics = score_regression(
      y_test, y_pred, get_metrics_names(), y_max, y_min, y_std, sample_weights, bin_info)
  return metrics

Results

The results revealted other than the experience of the employees, the industry and espectially the area of the employees also matters a lot for the high or low variation in wages.

Installation

To set up the project environment:

Clone the repository:

git clone https://github.com/GaneshKotaSLU/Azure-ML---US-Wage-Regression-Analysis.git

Navigate to the Project Directory:

cd Azure-ML---US-Wage-Regression-Analysis

Challenges and Limitations

Data quality and completeness varied across different employee segments.
Since it is hosted in azure, once the subscription gets finished or the resource utlization is full, the application cannot be accessible.
Rewuired to Have the Azure subscription if you would like to deploy the model to the server.
Due to high volume of the available data and so the data processing and model building will take a lot of time.

Contributing

Welcome contributions to this project. Please follow these steps:

Fork the repository

Create a new branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Citation

If you use this work in your research, please cite:

Kota, G. (2023). Regression analysis of US employee wages using Azure ML. GitHub repository, https://github.com/GaneshKotaSLU/Azure-ML---US-Wage-Regression-Analysis

Technologies Used

The below are few of the technologies used in this project.

Python 3.8+
Azure Machine Learning Studio
LightGBM
Tree Based Models
Pandas
Scikit-learn
Matplotlib
LightGBM

Next Steps

This project can further be enahced by incorporating some more valuable information like the employees' domain, country, ... etc and can be fully hosted on live data if the cloud subscruption is active.

Support

Support our work by starring our GitHub repository. For any questions or suggestions, please open an issue in the repository.

This comprehensive README provides a detailed overview of your project, its methodology, results, and future directions. It includes all the sections we discussed earlier, with placeholders for specific results and findings that you can fill in with your actual data. The structure is designed to be informative for both technical and non-technical readers, making your project more accessible and encouraging collaboration.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
README.md		README.md
ganesh_kota_US_Wages_original.json		ganesh_kota_US_Wages_original.json
ganesh_kota_definition.json		ganesh_kota_definition.json
ganesh_kota_dependencies_conda_env_v_1_0_0.yml		ganesh_kota_dependencies_conda_env_v_1_0_0.yml
ganesh_kota_initial_dependecnies-copy.ipynb		ganesh_kota_initial_dependecnies-copy.ipynb
ganesh_kota_python_script-copy.py		ganesh_kota_python_script-copy.py
ganesh_kota_regression_driver.py		ganesh_kota_regression_driver.py
ganesh_kota_scoring_file_v_2_0_0.py		ganesh_kota_scoring_file_v_2_0_0.py
model.pkl		model.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Azure-ML-US-Wage-Regression-Analysis

US State Wise Wages Analysis using Azure Machine Learning

Table of Contents

Overview of the Study

Data Assets

Feature Engineering Analysis

Model Performance Metrics

Results

Installation

Challenges and Limitations

Contributing

Fork the repository

License

Citation

Technologies Used

Next Steps

Support

About

Releases

Packages

Languages

GaneshKotaSLU/Azure-ML---US-Wage-Regression-Analysis

Folders and files

Latest commit

History

Repository files navigation

Azure-ML-US-Wage-Regression-Analysis

US State Wise Wages Analysis using Azure Machine Learning

Table of Contents

Overview of the Study

Data Assets

Feature Engineering Analysis

Model Performance Metrics

Results

Installation

Challenges and Limitations

Contributing

Fork the repository

License

Citation

Technologies Used

Next Steps

Support

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages