Get ROI values from raster and combine all CSV (remote sensing statistics with R)

EDIT in 2018 : More than two years later after this post, I just discover it back and realize how crazy I was to use R for Remote Sensing and for extracting ROI. I really suggest you to use Python or Orfeo Tool Box (Sample Extraction) to do this thing, and really not R !

R is a great tool for statistics, even if it may be slow in calculing raster’stats, it is really a good way for batching all of your work and presenting your data using MarkDown language. This example is one of my first autonomous script in R, and it is very exciting for further.

Report and scripts made during my last year placement at the Guiana Amazonian Park are on github.

Before showing you the different steps of the code, I need to present you the way I structured my data, because the script is going to parse them and to compile all tables in one file.

How did I structure my data ?

As I want to calculate my statistics also for indices (like NDCI, NDVI, EVI2, …), for date and location (which is the scene name, so the folder name), I made a protocol of naming.

First, each scene has an unique folder, and the file naming is set that way :

  • %date%-%scene%.tif
  • %date%-%scene%.shp
  • %date%-%scene%_%indice%.tif

So the main image and ROI (shp) have exactly the same name, and the indice have just a _indice in more before the extension. That will be very useful later, you’ll see.

Hierarchy and naming of my files
Hierarchy and naming of my files

Extract ROI values from raster bands

Now that you know how I organized my data, here’s a demo on how to extract the ROI from each band and save it into a table (mydf).
I set inRaster and inShape manually this time to show you how data is extracted :

inRaster <- 'images/685-343/20120505-6685-343.tif'
inShape <- 'images/685-343/20120505-6685-343.shp'

inShape <- readShapePoly(inShape)

# Loop throught each band
for(i in 1:inRaster@file@nbands){
 openBand <- raster(inRaster,band=i)
 # SPOT5 image so 4 bands, Green(V), Red, NIR and MIR
 else if(i==2){band='R'}
 else if(i==3){band='NIR'}
 else if(i==4){band='MIR'}
 print(paste('calculating',band,sep=' '))

 # Assign band name with min/max like NIRmin, NIRmax...
 mydfMin <- paste(band, 'min', sep = "")
 mydfMax <- paste(band, 'max', sep = "")

 # Extract value and save it in right column
 minB <- extract(openBand,inShape, fun = min)
 maxB <- extract(openBand,inShape, fun = max)
 mydf[mydfMin] <- round(minB,2)
 mydf[mydfMax] <- round(maxB,2)

You will obtain a table like this one :


Automatic find main image, indices and shapefile

How to specify to first extract data from the main image, and not from the indices ? As you’ve seen in my naming protocol, indices have longer name than main images, so we have only to order files by length and we’ll have main images at the beginning. This is done to create a first csv file, and then add data with indices.
To order a list by length in R, just do it with your list :

# Get all tif from folder
inRaster <- list.files(folder,pattern='*.tif$')

# Order list to have main image first

As my files have exactly the same beginning when it’s the same data/scene, you need to split the name with the character “” and with “.” for the extension.
If there is no “
“, it means it is your main image and not an indice.

# get fileName with no extension and/or no indice
fileName <- strsplit(strsplit(rasterLoop, "\\.")[[1]], "\\_")[[1]]
inShape <- paste(fileName[1],'.shp',sep='')

openRaster <- raster(rasterLoop)

# if indice image (type : image_ndvi.tif), take same shp as original image
# fileName[2] is the indice name, or it's NA
if (![2])){

  # if mask, avoid calc
    avoidImage <- TRUE
    indiceImage <- TRUE
  indiceName <- fileName[2]

  indiceImage <- FALSE

By this way the script will find the ROI (inShape) using the basename of the file, and set variable indiceImage to TRUE to make column name with NDVImin/NDVImax, and so on…

It’s almost done ! Just need to combine CSV

When all the statistics have been done, and it could take a couple of time, we need to combine all the table into one, and depending missing columns from same csv.

Default method is to use cbind for combining csv, but if you have for example one more column in a csv, it won’t work. So I use gtools’ function smartbind which replace empty values by NA, quite useful !

# Get all csv in all subfolders
inCsv <- list.files(recursive=TRUE,pattern='*.csv$')
outCsv <- 'zonalStat.csv'
# save it at root with name zonalStat.csv

# create empty data frame
df <- data.frame()

# loop throught csv and save it in df
for(i in inCsv){
 temp <- read.csv2(i)

 #using smartbind from gtools to combine if different column number
 df <- smartbind(df, temp)