Contents

Coding Period: First Phase Part 2

Phase 1 of the coding period is about to get over and 2 weeks have already passed since I gave an update on my GSoC project. As per my last blog, I successfully implemented the prototype transformation methods of the base interface, let’s look at the developments happened after the first 2 weeks.

Progress on the Project

The main goal of the first phase was to implement and finalize the base raster interface, in other words, I had to incorporate all methods which can be utilized to make the transformations from xarray.DataArray to both weight objects of PySAL (explained in my previous blog) and convert data back to xarray.DataArray. In the first 2 weeks of the coding period, we discussed the potential API of these methods and I worked on the prototype functions (this pr shows the initial prototype).

The proposed timeline in the project proposal for last 2 weeks is listed below:

  • Design the Uniform API which will aid in building the interface and in integrating well across PySAL library.
  • Finalize the logic of the transformational methods based on eager evaluation.
  • Add additional functionality to convert raster data straight to Geo/Pandas Dataframe
  • Implement other required raster utility methods.
  • Create utility functions that will be required for reshaping and aligning the raster data with spatial weights matrix.
  • Create a functionality for processing the outputs from PySAL to write them out to rasterio object. (rasterio was replaced by xarray, read last blog for the reason)

Most of the tasks were achieved during this period except 3rd one which is not in our plan currently but I will look into it in the later stage.

New API Design/Enhancements

After several discussions with my mentors it was decided that the better way to present the transformational methods to the user is by using class methods through the contiguity classes (Rook and Queen for now). This required some modification in the existing weights object, current approach adds one additional attribute to store the coordinates of non-missing values (this will be further discussed with the rest of the community during this weeks group call). After finalizing the base interface by adding necessary changes targeting the community feedback I’ll add the tests.

Example Notebook for Raster Interface

This notebook will provide an overview of ways we can use newly added raster interface to operate on raster data.

https://mybinder.org/badge_logo.svg

1
2
3
4
5
6
%matplotlib inline

from libpysal.weights import Rook, Queen
import numpy as np
import xarray as xr
from esda import Moran_Local
/data/GSoC/libpysal/libpysal/weights/util.py:20: UserWarning: geopandas not available. Some functionality will be disabled.
  warn('geopandas not available. Some functionality will be disabled.')

Currently each method inside this interface only accepts xarray.DataArray as input, let’s read in some data using open_rasterio() method:

1
2
3
4
# Multiband band raster data with no missing values
da1 = xr.open_rasterio('HARV_RGB_Ortho.tif')
# Slicing the dataarray
da1 = da1[:, 600:700, 700:800] 
1
2
# Ploting only first band of the da
da1[:1].plot()
png

After accessing the raster data we will create weights object using a method called from_xarray from Rook class

1
2
# Let's check the method first
help(Rook.from_xarray)
Help on method from_xarray in module libpysal.weights.contiguity:

from_xarray(da, band=None, sparse=False, **kwargs) method of builtins.type instance
    Construct a weights object from a xarray.DataArray object 
    
    Parameters
    ----------
    da         : xarray.DataArray
                raster file accessed using xarray.open_rasterio method
    band       : int
                select band for raster with multiple bands
    sparse     : boolean
                type of weight object. Default is dense. For sparse, sparse = True
    **kwargs   : keyword arguments
                optional arguments for :class:`pysal.weights.W`
    See Also
    --------
    :class:`libpysal.weights.weights.W`

from_xarray method will first check if data can be converted and if the bands is passed or not, if not passed then it’ll select first band as default

1
2
# Creating weight of object from raster
w_rook = Rook.from_xarray(da1)
/data/GSoC/libpysal/libpysal/weights/raster.py:37: UserWarning: Multiple bands detected in the DataArray. Using default band 1 for further computation
  warn('Multiple bands detected in the DataArray. Using default band 1 for further computation')

This created weight object contains one additional attribute coords which stores the coordinates of non missing values in the form of MultiIndex of pandas.Series

1
2
# Quick view at the data stored in coords
w_rook.coords[:5]
MultiIndex([(1, 4713385.375, 732173.625),
            (1, 4713385.375, 732173.875),
            (1, 4713385.375, 732174.125),
            (1, 4713385.375, 732174.375),
            (1, 4713385.375, 732174.625)],
           names=['band', 'y', 'x'])

After converting DataArray to pandas,Series, we can access non missing values of raster data (this is still under process of review)

1
2
3
# Loading the raster data for further calculation
data1 = da1.to_series()[w_rook.coords]
data1[:5]
band  y            x         
1     4713385.375  732173.625    132.0
                   732173.875     94.0
                   732174.125     91.0
                   732174.375    110.0
                   732174.625    106.0
dtype: float64
1
2
3
# Quickly computing and loading a LISA
np.random.seed(12345)
lisa = Moran_Local(data1,w_rook)

We’ll now try to convert computed data back to xarray.DataArray, we can use to_xarray method belonging to weight method for the transformation.

1
2
# Let's check the method first
help(w_rook.to_xarray)
Help on method to_xarray in module libpysal.weights.weights:

to_xarray(data, indices=None, attrs=None) method of libpysal.weights.weights.W instance
    converts the aligned data to a `xarray.DataArray` object
    
    Arguments
    ---------
    data    :   array
                data values stored in 1d array
    indices :   Dictionary/xarray.core.coordinates.DataArrayCoordinates
                coordinates from original DataArray
    attrs   :   Dictionary
                attributes from original DataArray 
    Returns
    -------
    
    da : xarray.DataArray
        instance of xarray.DataArray
1
2
# Converting obtained data back to DataArray
moran_da1 = w_rook.to_xarray(lisa.p_sim, da1.coords, da1.attrs)
1
2
# Plotting the converted DataArray
moran_da1.plot()
png

We’ll now perform same operations on different raster data

1
2
3
4
# single band raster data with missing values
da2 = xr.open_rasterio('nasadem_sd.tif')
# slicing the dataarray
da2 = da2[:, :100, 300:400]
1
2
# ploting dataarray after removing missing values
da2.where(da2.data>0).plot()
png

This time we will create weights object using Queen class

1
2
# Creating weight of object from raster
w_queen = Queen.from_xarray(da2)
1
2
3
# Loading the raster data for further calculation
data2 = da2.to_series()[w_queen.coords]
data2[:5]
band  y       x          
1     33.505  -117.509444    256
              -117.509167    261
              -117.508889    263
              -117.508611    263
              -117.508333    264
dtype: int16
1
2
3
# Quickly computing and loading a LISA
np.random.seed(12345)
lisa = Moran_Local(data2,w_queen)
1
2
# Let's check the method first
moran_da2 = w_queen.to_xarray(lisa.p_sim, da2.coords, da2.attrs)
1
2
# Plotting the converted DataArray
moran_da2.where(moran_da2.data>0).plot()
png

Experience

Without a doubt, this summer has been very interesting and worthwhile with tons of knowledge gained and new things explored related to the project. After every interaction with my mentors, my appreciation for PySAL grow even more, I’m able to understand the design decisions went behind the creation of APIs, different methods and overall structure of each module in PySAL library.

What’s next?

There are some API fixes which my mentors suggested I’ll work on those side by side I’ll also work on the feedback and suggestions which will be given by the community in the upcoming group call and will try to finalize the interface before next week, after that I’ll add remaining tests and docstring examples.

Wait for my next blog which will shed some light on the optimization part along with the integration of the interface with other parts of the library… Till then adios amigos 👋.