Get ROI values from raster and combine all CSV (remote sensing statistics with R)
EDIT in 2018 : More than two years later after this post, I just discover it back and realize how crazy I was to use R for Remote Sensing and for extracting ROI. I really suggest you to use Python or Orfeo Tool Box (Sample Extraction) to do this thing, and really not R !
R is a great tool for statistics, even if it may be slow in calculing raster’stats, it is really a good way for batching all of your work and presenting your data using MarkDown language. This example is one of my first autonomous script in R, and it is very exciting for further.
Report and scripts made during my last year placement at the Guiana Amazonian Park are on github.
Before showing you the different steps of the code, I need to present you the way I structured my data, because the script is going to parse them and to compile all tables in one file.
How did I structure my data ?
As I want to calculate my statistics also for indices (like NDCI, NDVI, EVI2, …), for date and location (which is the scene name, so the folder name), I made a protocol of naming.
First, each scene has an unique folder, and the file naming is set that way :
- %date%-%scene%.tif
- %date%-%scene%.shp
- %date%-%scene%_%indice%.tif
So the main image and ROI (shp) have exactly the same name, and the indice have just a _indice in more before the extension. That will be very useful later, you’ll see.

Extract ROI values from raster bands
Now that you know how I organized my data, here’s a demo on how to extract the ROI from each band and save it into a table (mydf).
I set inRaster and inShape manually this time to show you how data is extracted :
inRaster <- 'images/685-343/20120505-6685-343.tif' inShape <- 'images/685-343/20120505-6685-343.shp' inShape <- readShapePoly(inShape) # Loop throught each band for(i in 1:inRaster@file@nbands){ openBand <- raster(inRaster,band=i) # SPOT5 image so 4 bands, Green(V), Red, NIR and MIR if(i==1){band='V'} else if(i==2){band='R'} else if(i==3){band='NIR'} else if(i==4){band='MIR'} print(paste('calculating',band,sep=' ')) # Assign band name with min/max like NIRmin, NIRmax... mydfMin <- paste(band, 'min', sep = "") mydfMax <- paste(band, 'max', sep = "") # Extract value and save it in right column minB <- extract(openBand,inShape, fun = min) maxB <- extract(openBand,inShape, fun = max) mydf[mydfMin] <- round(minB,2) mydf[mydfMax] <- round(maxB,2) }You will obtain a table like this one :
Automatic find main image, indices and shapefile
How to specify to first extract data from the main image, and not from the indices ? As you’ve seen in my naming protocol, indices have longer name than main images, so we have only to order files by length and we’ll have main images at the beginning. This is done to create a first csv file, and then add data with indices.
To order a list by length in R, just do it with your list :# Get all tif from folder inRaster <- list.files(folder,pattern='*.tif$') # Order list to have main image first inRaster[order(nchar(inRaster))]As my files have exactly the same beginning when it’s the same data/scene, you need to split the name with the character “” and with “.” for the extension.
If there is no ““, it means it is your main image and not an indice.# get fileName with no extension and/or no indice fileName <- strsplit(strsplit(rasterLoop, "\\.")[[1]], "\\_")[[1]] inShape <- paste(fileName[1],'.shp',sep='') openRaster <- raster(rasterLoop) # if indice image (type : image_ndvi.tif), take same shp as original image # fileName[2] is the indice name, or it's NA if (!is.na(fileName[2])){ # if mask, avoid calc if(fileName[2]=='mask'){ avoidImage <- TRUE } else{ indiceImage <- TRUE } indiceName <- fileName[2] } else{ indiceImage <- FALSE }By this way the script will find the ROI (inShape) using the basename of the file, and set variable indiceImage to TRUE to make column name with NDVImin/NDVImax, and so on…
It’s almost done ! Just need to combine CSV
When all the statistics have been done, and it could take a couple of time, we need to combine all the table into one, and depending missing columns from same csv.
Default method is to use cbind for combining csv, but if you have for example one more column in a csv, it won’t work. So I use gtools’ function smartbind which replace empty values by NA, quite useful !
# Get all csv in all subfolders inCsv <- list.files(recursive=TRUE,pattern='*.csv$') outCsv <- 'zonalStat.csv' # save it at root with name zonalStat.csv if(file.exists(outCsv)){ file.remove(outCsv) } # create empty data frame df <- data.frame() # loop throught csv and save it in df for(i in inCsv){ print(i) temp <- read.csv2(i) #using smartbind from gtools to combine if different column number df <- smartbind(df, temp) rm(temp) } write.csv2(df,outCsv)