The data is one of the bigger, maybe the biggest, pillar in the Rural Water Intelligence solutions. The data is something alive, not static, and goes through different phases, from capture to storing or interpretation. It follows what I like to call the Data cycle. Five steps can be identified in this data cycle for the operational data in RWI systems:

  • Data generation and collection
  • Data integration
  • Data validation and consolidation
  • Data interpretation
  • Data processing

During this articles series we will review one by one all these phases, from the data generation to the data processing and reutilisation, through the data integration, validation, consolidation and analysis: a review of the life (and lives) of our data and how it is managed, transformed and used in the RWI systems.

Phase I. Data generation and collection

The first step is the data birth (the data generation) and its capture. This phase covers the journey of the data since it is produced or generated till it reaches one of the inputs of our system.

The source of our data can be of different natures. For example, we can be looking at a water level reading in a water level sensor. Or maybe we are looking at the precipitation measurement on a rain gauge installed on an automatic weather station. Or (and our data does not need to come from a physical source) we can be looking at precipitation forecast values generated by the Bureau of Meteorology. Even, it can have been generated manually, e.g. water orders being introduced by farmers.

At the beginning of this journey, on its capture, the originated data – the observation – depends on the measurement device – the observer – (and no, I don’t mean if the cat is alive or not in the box – well, at least I don’t want to go that far), or in its generation process, if it is an automatic or manually generated value.

For example, in case of data received by a water level sensor, this data is nothing more than a quantification of a physical event in the real world (the water in a channel going up or down) but, how this water level is measured? Is it an ultrasonic sensor? A pressure transducer?

Or, in our precipitation forecast, the data is the result of a model (it is usually produced by numerical weather prediction models that then are combined using ensembles) which relies on more data (historical, interpolated, precalculated parameters, etc.) that was, at one stage, measured, calculated or even forecasted. And so on.

But that is not all. The data has been originated but, how do we get to it? It may be more complicated that it seems. If it comes from a model, we can probably find it on a web service or a FTP server. But if the data has been captured in a remote water level sensor, it has to be transported to the proper SCADA (by radio, GPRS, 3G, etc.) and, how and when is the data stored on the SCADA? There are plenty of alternatives: the data can be measured every fifteen minutes (requested by the SCADA server or pushed by the remote unit), or it can be sent after a certain level change (what is call the deadband), or even a combination of both.

And that is only the beginning of the data journey. Now the data has been generated and stored in one of the data source of our systems (a SCADA system, a web service, a FTP server, a database, etc.) waiting to be extracted and integrated into our RWI system. But that is another history.

Note for the expert and avid reader: this article is not meant to be complete or exhaustive. I just wanted to show the complexity of the different sources of the RWI data and the diversity on data nature. Feel free to leave any comment or correction if you want to share something. Feedback is very welcome.