-
Notifications
You must be signed in to change notification settings - Fork 0
/
03-faoswsAquastatExternal.Rmd
208 lines (134 loc) · 8.51 KB
/
03-faoswsAquastatExternal.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
# **The faoswsAquastatExternal module** {#AquastatExternal}
The **faoswsAquastatExternal** is essentially a data harvester that emerges as demand for making the AQUASTAT process of extracting and reshaping data from external sources (web databases) easier.
```{r figExternal, echo=FALSE, out.width="65%", fig.align="center", fig.show='hold', fig.cap='Workflow of the faoswsAquastatExternal module'}
knitr::include_graphics("images/faoswsAquastatExternal.png")
```
<!--*Only Possible* **Workflow**
<br>
```{block , type='rmdnote'}
**FAOSTAT, the ONLY REACHABLE DATA SOURCE:**
The harvest of all external data was proved to be not possible due to IT security reasons. During the development, it was noticed that SWS modules running in SWS could not access external downloadable links. The only exceptions are the data from FAOSTAT.
Therefore, while the module does not have the clearance to download external data, the `faoswsAquastatExternal` module will only harvest FAOSTAT data.
```-->
<!--```{r figExternalPossible, echo=FALSE, out.width="65%", fig.align="center", fig.show='hold', fig.cap='Workflow of the faoswsAquastatExternal module harvesting FAOSTAT data only.'}
knitr::include_graphics("images/POSSIBLEfaoswsAquastatExetrnal.png")
```-->
## **Steps**
The module executes the following steps to harvester external (`FAOSTAT`) data.
### **Identifies AQUASTAT external data sources**
The module needs to identify the external data used by AQUASTAT. This identification is made with the help of the SWS datatable **aqua_external_sources** below:
```{r tab3, echo=FALSE, message=FALSE}
require(data.table)
require(kableExtra)
d = head(data.table::fread("tables/aqua_external_sources_corr.csv"))
knitr::kable(d,
align = 'c',
# escape = T,
# format = "html",
# table.attr='class="table-fixed-header"',
caption = "AQUASTAT external sources - First rows") %>%
row_spec(0, background = "#b6c5d4") %>%
# kableExtra::kable_styling(font_size = 12) %>%
kable_styling(fixed_thead = T)
```
<br>
After identification, the module downloads the data to temporary files and read it in to start the processing.
### **Reshapes the data sources**
Data from different sources are likely to occur in different formats. Therefore, after the download of the data, the module needs to apply a different reshaping strategy to each data source. Regardless the initial shape of the external AQUASTAT data sources, the module is going to convert **each** single source into an SWS long format dataset with **geographicAreaM49**, **aquastatElement**, **timePointYears**, **flagOvservationStatus** and **flagMethod**.
The **flagOvservationStatus** and **flagMethod** are **"X"** and **"c"** for all data extracted by the faoswsAquasatExternal module.
### **Maps out AQUASTAT to external sources**
With the data already reshaped, it is effortless for the module to keep relevant AQUASTAT elements since the **aqua_external_source** data table provides the means for correctly mapping out the original external data code to AQUASTAT codes.
### **Maps out FAOSTAT areas to UNSDM49**
The SWS datatable **m49_fs_iso_mapping** has the correspondence among different international codes (FAOSTAT, UNSDM49, ISO2, ISO3) for geographic areas and is used to convert area codes in the external sources to UNSDM49 codes which is the standard in the SWS.
### **Saves the output back into SWS**
Finally, the module merges the sources into a single dataset. This output is an SWS-compliant long-format dataset name **aquastat_external** that is saved by the user in the SWS and will be ready to serve as an **input** of the **faoswsAquastatUpdate** module.
<!--```{r fig1, echo=FALSE, fig.cap='Workflow of the faoswsAquastatExternal module.'}
library(DiagrammeR)
DiagrammeR::grViz("digraph {
graph [layout = dot, rankdir = LR]
# define the global styles of the nodes. We can override these in box if we wish
node [shape = rectangle, style = filled, fillcolor = Linen]
# Inputs
FAO [label = 'Dataset: \n FAO data', shape = folder, fillcolor = Beige]
ILO [label = 'Dataset: \n ILO data', shape = folder, fillcolor = Beige]
UNDP [label = 'Dataset: \n ILO data', shape = folder, fillcolor = Beige]
WB [label = 'Dataset: \n World Bank data', shape = folder, fillcolor = Beige]
JMP [label = 'Dataset: \n JMP(WHO/UNICEF) data', shape = folder, fillcolor = Beige]
# data tables
aquaexter [label = 'Data table: \n aqua_external_source data', shape = folder, fillcolor = LightGrey]
m49iso [label = 'Data table: \n m49_fs_iso_mapping', shape = folder, fillcolor = LightGrey]
# processing
externalmodule [label = ' Processing: \n faoswsAquastatExternal using:\n 1. Download and read in the data, \n 2. Reshape data \n to an long-format dataset, \n 3. Map out original codes to AQUASTAT codes, \n 4. Retain relevant elements', shape = rectangle, fillcolor = LightBlue]
# Output
Output [label = 'Dataset: \n aquastat_external \n External data in an SWS compliant dataset', shape = folder, fillcolor = Beige]
# Flow
# edge definitions with the node IDs
{FAO, ILO, UNDP, WB, JMP} -> externalmodule -> Output
{aquaexter, m49iso} -> externalmodule
}")
```-->
<!--```{r possiblewf, echo=FALSE, fig.cap='Workflow of the faoswsAquastatExternal module HARVESTING FAOSTAT data only.'}
library(DiagrammeR)
DiagrammeR::grViz("digraph {
graph [layout = dot, rankdir = LR]
# define the global styles of the nodes. We can override these in box if we wish
node [shape = rectangle, style = filled, fillcolor = Linen]
# Inputs
FAO [label = 'Dataset: \n FAO data', shape = folder, fillcolor = Beige]
ILO [label = 'Source: \n ILO data', shape = folder, fillcolor = Red]
UNDP [label = 'Source: \n ILO data', shape = folder, fillcolor = Red]
WB [label = 'Source: \n World Bank data', shape = folder, fillcolor = Red]
JMP [label = 'Source: \n JMP(WHO/UNICEF) data', shape = folder, fillcolor = Red]
blocked [label = 'Blocked: \n Data sources with access denied by SWS', fillcolor = Beige]
# data tables
aquaexter [label = 'Data table: \n aqua_external_sources', shape = folder, fillcolor = LightGrey]
m49iso [label = 'Data table: \n m49_fs_iso_mapping', shape = folder, fillcolor = LightGrey]
# processing
externalmodule [label = ' Processing: \n faoswsAquastatExternal using:\n 1. Download and read in the data, \n 2. Reshape data \n to an long-format dataset, \n 3. Map out original codes to AQUASTAT codes, \n 4. Retain relevant elements', shape = rectangle, fillcolor = LightBlue]
# Output
Output [label = 'Dataset: \n aquastat_external \n External data in an SWS compliant dataset', shape = folder, fillcolor = Beige]
# Flow
# edge definitions with the node IDs
{FAO} -> externalmodule -> Output
blocked -> {ILO, UNDP, WB, JMP}
{aquaexter,m49iso} -> externalmodule
}")
```-->
## **Running the module**
1. Log in the SWS;
2. Click on **New Query**;
3. Select **AQUASTAT domain** and **aquastat_external dataset**;
4. Select whatever geographicM49Area, aquastatElement, and timePointYears;
<br>
```{r extquery, echo=FALSE, out.width="100%", fig.cap='Steps 1 to 4'}
knitr::include_graphics("images/aqua_external_query.png")
```
<br>
<br>
```{block , type='rmdwarning'}
The **faoswsAquastatExternal** searches and downloads data from external sources. Therefore, what the user is selecting in the query is irrelevant for the module's output.
```
<br>
5. Run the query and get an empty session;
<br>
```{r extempty, echo=FALSE, out.width="100%", fig.cap='Empty SWS session of the aquastat_external dataset'}
knitr::include_graphics("images/aqua_external_empty.png")
```
<br>
6. Click on **Run plugin** on the top-right;
7. Select the **faoswsAquastatExternal** module and click on **Run plugin**;
<br>
```{r extplugin, echo=FALSE, out.width="100%", fig.cap='Select the AquastatExternal plugin and run it'}
knitr::include_graphics("images/aqua_external_plugin.png")
```
<br>
8. Wait for the results to appear in the session;
<br>
```{r extoutput, echo=FALSE, out.width="100%", fig.cap='Select the AquastatExternal plugin output in the session'}
knitr::include_graphics("images/aqua_external_output.png")
```
<br>
9. Click on **Save to dataset**. In case new external data are available, please click on **Save to dataset** for the new data to be integrated to the **aquastat_external** dataset and later to the **aquastat_update** dataset;
```{block , type='rmdtip'}
The aquastat_external dataset has already been saved in the SWS database to prop up the **faoswsAquastatUpdate** module development. If the user knows that the External sources have not been updated, he/she does not neeed to save the dataset gain.
```