An intro to `EOCube`¶

[1]:

%load_ext autoreload
%autoreload 2

import numpy as np
import pandas as pd
from pathlib import Path
import rasterio
import seaborn as sns

# from eobox.raster import MultiRasterIO
from eobox import sampledata
from eobox.raster import cube



def get_sampledata(year):
    dataset = sampledata.get_dataset("lsts")
    layers_paths = [Path(p) for p in dataset["raster_files"]]
    layers_df = pd.Series([p.stem for p in layers_paths]).str.split("_", expand=True) \
    .rename({0: "sceneid", 1:"band"}, axis=1)

    layers_df["date"] = pd.to_datetime(layers_df.sceneid.str[9:16], format="%Y%j")
    layers_df["uname"] = layers_df.sceneid.str[:3] + "_" + layers_df.date.dt.strftime("%Y-%m-%d") + "_" + layers_df.band.str[::]
    layers_df["path"] = layers_paths

    layers_df = layers_df.sort_values(["date", "band"])
    layers_df = layers_df.reset_index(drop=True)

    layers_df_year = layers_df[(layers_df.date >= str(year)) & (layers_df.date < str(year+1))]
    layers_df_year = layers_df_year.reset_index(drop=True)
    return layers_df_year

Input data constraints¶

With EOCube we can for example calculate temporal features from stacks of data that do not fit in memory.

We accept the following data constrains:

All layers, i.e. a single band of an acquisition, are available as single band GDAL readable raster files.
All files represent raster that are exactly spatially aligned with the same resolution, extend, projection, number of rows and columns etc.
(If all the above holds true you might band-subset or cut the data as VRTs to meet the spatial extend and single layer file constraint.
The data needs to come in a pandas dataframe containing at least the columns path containing the respective information.

For calculating temporal features date, band, sceneid - as in the following sample dataset - are useful additional columns that you then can accessed later in your custom code.

[2]:

df_layers = get_sampledata(2008)
display(df_layers.head())
df_layers.band.value_counts()

	sceneid	band	date	uname	path
0	LT50350322008110PAC01	b3	2008-04-19	LT5_2008-04-19_b3	/home/ben/anaconda/envs/eocube/lib/python3.6/s...
1	LT50350322008110PAC01	b4	2008-04-19	LT5_2008-04-19_b4	/home/ben/anaconda/envs/eocube/lib/python3.6/s...
2	LT50350322008110PAC01	b5	2008-04-19	LT5_2008-04-19_b5	/home/ben/anaconda/envs/eocube/lib/python3.6/s...
3	LT50350322008110PAC01	fmask	2008-04-19	LT5_2008-04-19_fmask	/home/ben/anaconda/envs/eocube/lib/python3.6/s...
4	LE70350322008118EDC00	b3	2008-04-27	LE7_2008-04-27_b3	/home/ben/anaconda/envs/eocube/lib/python3.6/s...

[2]:

b5       23
b4       23
fmask    23
b3       23
Name: band, dtype: int64

[3]:

df_layers.head(5)

[3]:

	sceneid	band	date	uname	path
0	LT50350322008110PAC01	b3	2008-04-19	LT5_2008-04-19_b3	/home/ben/anaconda/envs/eocube/lib/python3.6/s...
1	LT50350322008110PAC01	b4	2008-04-19	LT5_2008-04-19_b4	/home/ben/anaconda/envs/eocube/lib/python3.6/s...
2	LT50350322008110PAC01	b5	2008-04-19	LT5_2008-04-19_b5	/home/ben/anaconda/envs/eocube/lib/python3.6/s...
3	LT50350322008110PAC01	fmask	2008-04-19	LT5_2008-04-19_fmask	/home/ben/anaconda/envs/eocube/lib/python3.6/s...
4	LE70350322008118EDC00	b3	2008-04-27	LE7_2008-04-27_b3	/home/ben/anaconda/envs/eocube/lib/python3.6/s...

`EOCube` and `EOCubeChunk`¶

`EOCube`¶

[6]:

eoc = cube.EOCube(df_layers, chunksize=2**5)

Attributes you can get.

[7]:

print("The data frame with your data.")
display(eoc.df_layers.head(2))
print("Chunksize: ", eoc.chunksize)
print("Number of Chunks: ", eoc.n_chunks)

The data frame with your data.

	sceneid	band	date	uname	path
0	LT50350322008110PAC01	b3	2008-04-19	LT5_2008-04-19_b3	/home/ben/anaconda/envs/eocube/lib/python3.6/s...
1	LT50350322008110PAC01	b4	2008-04-19	LT5_2008-04-19_b4	/home/ben/anaconda/envs/eocube/lib/python3.6/s...

Chunksize:  32
Number of Chunks:  4

Attributes you can set.

[8]:

eoc.chunksize=2**4
print("Chunksize: ", eoc.chunksize)
print("Number of Chunks: ", eoc.n_chunks)

eoc.chunksize=2**5
print("Chunksize: ", eoc.chunksize)
print("Number of Chunks: ", eoc.n_chunks)

Chunksize:  16
Number of Chunks:  16
Chunksize:  32
Number of Chunks:  4

`EOCubeChunk`¶

This is a child class of EOCube and you get an EOCubeChunk object from an EOCube object.

[9]:

ji = 1
eoc_chunk = eoc.get_chunk(ji)
print("Chunk ID: ", eoc_chunk.ji)

Chunk ID:  1

Load data and convert it between ndarray and DataFrame.

[10]:

ji = 1
eoc_chunk = eoc.get_chunk(ji)
print("****************")
print(eoc_chunk.__class__)
print("eochunk.data is : ", eoc_chunk.data)
print("Chunk ID: ", eoc_chunk.ji)

print("****************")
eoc_chunk = eoc_chunk.read_data()
print(eoc_chunk.data.__class__)
print("eochunk.data is ndarray with shape : ", eoc_chunk.data.shape)
data_ndarray = eoc_chunk.data.copy()

print("****************")
eoc_chunk = eoc_chunk.convert_data_to_dataframe()
print("eochunk.data is DataFrame with shape : ", eoc_chunk.data.shape)
print("(column names come from eoc_chunk.df_layers['uname']")
display(eoc_chunk.data.head())

print("****************")
eoc_chunk = eoc_chunk.convert_data_to_ndarray()
print("eochunk.data is again a ndarray with shape : ", eoc_chunk.data.shape)

****************
<class 'eobox.raster.cube.EOCubeChunk'>
eochunk.data is :  None
Chunk ID:  1
****************
<class 'numpy.ndarray'>
eochunk.data is ndarray with shape :  (32, 29, 92)
****************
eochunk.data is DataFrame with shape :  (928, 92)
(column names come from eoc_chunk.df_layers['uname']

uname	LT5_2008-04-19_b3	LT5_2008-04-19_b4	LT5_2008-04-19_b5	LT5_2008-04-19_fmask	LE7_2008-04-27_b3	LE7_2008-04-27_b4	LE7_2008-04-27_b5	LE7_2008-04-27_fmask	LT5_2008-05-05_b3	LT5_2008-05-05_b4	...	LT5_2008-10-28_b5	LE7_2008-11-21_b3	LE7_2008-11-21_b4	LE7_2008-11-21_b5	LE7_2008-11-21_fmask	LE7_2008-12-07_b3	LE7_2008-12-07_b4	LE7_2008-12-07_b5	LE7_2008-12-07_fmask
0	16000	6889	292	3	16000	7468	540	3	16000	6448	...	2410	5609	6227	3191	4	5040	6126	575	3
1	6868	6584	361	3	16000	6865	584	3	6149	6025	...	2134	5609	6140	3231	4	5248	6126	710	3
2	5590	5705	478	3	16000	6045	606	3	4933	5342	...	2099	5550	6271	3272	4	4624	5976	802	3
3	6008	5772	501	3	16000	5795	584	3	5065	5276	...	2514	5550	6227	3311	4	6176	7224	892	3
4	4917	4957	548	3	3874	4683	606	3	4134	4362	...	2099	5639	6271	3391	4	5627	6126	846	3

5 rows × 92 columns

****************
eochunk.data is again a ndarray with shape :  (32, 29, 92)

Write results with ``EOCubeWindow.write_ndarray()``.

Usually we want to chunk-wise process all the data but save each of the resulting layers as single files, i.e. all chunks merged.

However, when we want to parallel process the chunks it is easier better to first read, process and write single chunk data and later put them together when all the chunks are computed.

Thus, for saving data you need

your processing logic starting from an ndarray (if you want to start from a DataFrame see below), and
the destination files.

Let us get the data which will be the input to our custom process:

[11]:

ji = 1
eoc_chunk = eoc.get_chunk(ji)
eoc_chunk = eoc_chunk.read_data()

Apply our logic and get the result as a ndarray.

Here as an example we calculate a boolean mask with the valid pixels derived from the fmask layers.

[12]:

ilocs_fmask = np.where((eoc_chunk.df_layers["band"] == "fmask").values)[0]
result = eoc_chunk.data[:,:,ilocs_fmask]
print("shape of data   : ", eoc_chunk.data.shape)
result = np.isin(result, [0, 1]).astype("uint8")
print("shape of result : ", result.shape)

shape of data   :  (32, 29, 92)
shape of result :  (32, 29, 23)

Note that the first two numbers of the result array shape have to be identical to the ones of the input array.

Instead the number of layers (third number) have to match the number of your destination paths. Note that we only need filenames for the whole layers,not the chunks. The files for the chunk files will be derived in EOCubeWindow. Here we assume that later we will merge the chunks virtually in form of a VRT, thus the .vrt extension. So let us get the destination paths:

[13]:

dst_paths = [Path("./results") / (sid + "_valid.vrt") for sid in eoc_chunk.df_layers.iloc[ilocs_fmask]["sceneid"]]
dst_paths[:3]

[13]:

[PosixPath('results/LT50350322008110PAC01_valid.vrt'),
 PosixPath('results/LE70350322008118EDC00_valid.vrt'),
 PosixPath('results/LT50350322008126PAC01_valid.vrt')]

And finally use the EOCubeChunk.write_ndarray to save the results.

[14]:

eoc_chunk.write_ndarray(result, dst_paths)

**TODO: Show how to execute this for all chunks with EOCube. By now see how this is done in the next eocube tutorial: *Calculate virtual time series with EOCube.**

An intro to EOCube¶

Input data constraints¶

EOCube and EOCubeChunk¶

EOCube¶

EOCubeChunk¶

An intro to `EOCube`¶

`EOCube` and `EOCubeChunk`¶

`EOCube`¶

`EOCubeChunk`¶