Convert data to formats to easily ingest into Elasticsearch
Personal learning project which should result in a greater understanding of how to manipulate data formats, eg
- SQL -> JSON
- geo decoding JSON data containing location by adding latitude, longitude from address fields using external API
- JSON -> NDJSON -> import file to Elasticsearch
- JSON -> NDJSON -> Elasticsearch API
POST /_bulk
command, eg
POST alpacas/_bulk
{"index":{"_id":1}}
{"country": "NO","alpacaId":9876543210,"keeper":0123456789,"gender":"SEX_FEMALE","alpacaShortName":"ANITA IS COOL","name":"Anita's Alpacas"}
{"index":{"_id":2}}
{"country": "NO","alpacaId":9999543210,"keeper":0123456789,"gender":"SEX_MALE","alpacaShortName":"THOR IS COOL","name":"Anita's Alpacas"}
- Clone repo and navigate to new repo
- Run
npm install
-
See config.js and override any non sensitive values in the corresponding environment files eg config.test.json
-
.env
file in root project should contain keys for sensitive values, eg
Elasticsearch
ELASTIC_CLOUD_ID="UPDATE-ME"
ELASTIC_USERNAME="UPDATE-ME"
ELASTIC_PASSWORD="UPDATE-ME"
Google API
GOOGLE_MAPS_API_KEY="UPDATE-ME"
MySQL
MYSQL_PASSWORD="YOUR PASSWORD GOES HERE"
MySQL on Azure portal
- Get certificate from https://portal.azure.com/ MySQL flexible server > Settings > Networking > Download SSL Certificate
- Put it in the ./data folder which must be in
.gitignore
- Update filename to match config
db.ssl_ca
value
Pre-conditions
.env
file contains correct overrides for sensitive values for current environment- Create local folder ./data and store .sql file dump there
- Local MySQL is running and database is populated. If not follow the steps at pre-requisistes.md
- Local env
npm run sql_to_elastic
or test envnpm run sql_to_elastic_test
- Local env
npm run sql_to_json
or test envnpm run sql_to_json_test
Automate with Elasticsearch client
- Create index in Elasticsearch from existing JSON file:
node json_to_elastic
- edit JSON filename as needed // TODO automate getting this fromSQL -> JSON step - Verify the index was created in Elasticsearch Dev Tools:
GET alpacas/_search
- note it uses an alias that is updatedGET _alias/alpacas
Generate NDJSON file to import manually to Elasticsearch
- Edit the JSON filename to read from in json_to_ndjson.js and save the file
- Run
node json_to_ndjson
- Look for the generated file in the directory
- Import this file to Elasticsearch
npm run test
- Edit the file farm_category.js to contain actual categories for farms
- The values in the file overrides values in the database which are not up to date
Format the code
npm run prettier
- Location data from Google Maps
The work is under exclusive copyright by default.