Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gridSample: return spatialPoints object rather than matrix? #20

Open
plantarum opened this issue Oct 25, 2021 · 1 comment
Open

gridSample: return spatialPoints object rather than matrix? #20

plantarum opened this issue Oct 25, 2021 · 1 comment

Comments

@plantarum
Copy link

I'm using gridSample to thin my occurrence records. The records are stored as a spatialPointsDataframe, which already includes environmental values as columns in the dataframe. gridSample returns a matrix with the coordinates of the retained occurrences, but drops all the other data columns. Is it possible to thin the data in a way that retains the non-coordinate columns? Alternatively, is there a way to use the matrix returned by gridSample to subset the original spatialPointsDataframe?

Thanks

@plantarum
Copy link
Author

The following function does what I need.

Based on your original code for gridSample. I add an index to the spdf, and copy that over to the coordinates matrix xy. After the same processing you use to trim the records in xy, i use the index values that remain in xy to select desired points in spdf, then return that subset objects.

I also tried modifying the spdf object directly without extracting coordinates, but binding spatial data frames together in the for loop is much slower than binding matrices: 40 seconds vs 2 seconds in my tests.

If this looks reasonable and generally useful, I could prepare a pull request. Or if you know a more direct way to do this with existing functions, please let me know!

gridSampleTWS <- 
  function (spdf, r, n = 1) 
{
  ## add a new column with a unique index value for each
  ## record: 

  spdf$RECNUM <- 1:length(spdf)

  ## extract coordinates along with index:
  xy <- cbind(coordinates(spdf), index = spdf$RECNUM)

  r <- raster(r)
  cell <- cellFromXY(r, xy)

  ## uc contains the numbers for all non-NA cells (no duplicates)
  uc <- unique(stats::na.omit(cell))

  ## add the cell numbers to the table of coordinates 
  xy <- cbind(xy, cell = cell, rand = runif(nrow(xy)))
  
  ## drop missing cells:
  xy <- stats::na.omit(xy)

  ## seems unlikely there'll be unique rows, when one column
  ## is a newly-generated random number?

  xy <- unique(xy)

  ## sort by our random number:
  xy <- xy[order(xy[, "rand"]), ]
  xy <- as.data.frame(xy)
  pts <- data.frame(numeric(), numeric(), numeric(),
                   numeric(), numeric())
  names(pts) <- names(xy)
  for (u in uc) {
    ss <- subset(xy, xy[, "cell"] == u)
    pts <- rbind(pts, ss[1:min(n, nrow(ss)), ])
  }

  ret <- spdf[spdf$RECNUM %in% pts$index, ]
  ret <- ret[, names(ret) != "RECNUM"]
  return(ret)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant