4 Data cleaning

The first data analysis steps requiring a bit of coding is the data cleaning stage. Data cleaning is the most important and the most time consuming part of the data analysis workflow. All errors and data entry mistakes that are not detected at this stage can lead to erroneous results or require to remove large amounts of costly field measurements at a later stage if the results are nonsensical .

Workshop of OF Collect (photo: Lauri Vesa). [placeholder for image on data cleaning]

Figure 4.1: Workshop of OF Collect (photo: Lauri Vesa). [placeholder for image on data cleaning]

It is recommended recommended to start data cleaning scripts and templates before data collection and continuously run these scripts during the data collection phase. The cleaning process can then feedback potential entry errors to field crews while they are still near the measured plots. Field crews could then remeasure or correct errors and avoid data loss.

4.1 Visualizations

4.2 Corrections

4.2.1 Diameter and height

4.2.2 Tree location

4.2.3 Species list