eobox.raster.extraction

Module for extracting values of raster sampledata at location given by a vector dataset.

eobox.raster.extraction.add_vector_data_attributes_to_extracted(ref_vector, pid, dir_extracted, overwrite=False)[source]

From the vector dataset used for extraction save attributes as npy files corresponding to the extracted pixels values.

Parameters
  • ref_vector (str or pathlib.Path) – The vector dataset which has been used in extract().

  • pid (str) – The burn_attribute that has been used in extract(). Note that this only makes sense if the burn attribute is a unique feature (e.g. polygon) identifier.

  • dir_extracted (str or pathlib.Path) – The output directory which has been used in extract().

  • overwrite (bool, optional) – If True existing data will be overwritten, by default False-

eobox.raster.extraction.convert_df_to_geodf(df, crs=None)[source]

Convert dataframe returned by load_extracted to a geodataframe.

Parameters
  • df (dataframe) – Dataframe as returned by load_extracted. It must contain the columns aux_coord_x and aux_coord_y.

  • crs (None, dict, str or Path, optional) – The crs of the saved coordinates given as a dict, e.g. {'init':'epsg:32632'}, or via the dir_extracted. In the latter (str, Path) case, it is assumed that the crs can be derived from any tiff that is located in the folder. The defaultdoes not set any crs. By default None.

eobox.raster.extraction.extract(src_vector: str, burn_attribute: str, src_raster: list, dst_names: list, dst_dir: str, dist2pb: bool = False, dist2rb: bool = False, src_raster_template: Optional[str] = None, gdal_dtype: int = 4, n_jobs: int = 1)int[source]

Extract pixel values of a list of single-band raster files overlaying with a vector dataset.

This function does not return the extracted values but stores them in the dst_dir directory. The extracted values of each raster will be stored as a separate NumPy binary file as well as the values of the burn_attribute. Additionally, the folder will contain one or more intermediate GeoTIFF files, e.g, the rasterized burn_attribute and, if selected, the dist2pb and/or dist2rp layer.

Note that also the pixel coordinates will be extracted and stored as aux_coord_y and aux_coord_x. Therefore these names should be avoided in dst_names.

The function add_vector_data_attributes_to_extracted can be used to add other attributes from src_vector to the store of extracted values such that they can be loaded easily together with the other data.

With load_extracted the data can then be loaded conveniently.

If a file with a given name already exists the raster will be skipped.

Parameters
  • {str} -- Filename of the vector dataset. Currently (src_vector) –

  • must have the same CRS as the raster. (it) –

  • {str} -- Name of the attribute column in the src_vector dataset to be (burn_attribute) – stored with the extracted data. This should usually be a unique ID for the features (points, lines, polygons) in the vector dataset. Note that this attribute should not contain zeros since this value is internally used for pixels that should not be extracted, or, in other words, that to not overlap with the vector data.

  • {list} -- List of file paths of the single-band raster files from which to extract the pixel (src_raster) – values from.

  • {list} -- List corresponding to src_raster names used to store and later (dst_names) – identify the extracted to.

  • {str} -- Directory to store the data to. (dst_dir) –

Keyword Arguments
  • {bool} -- Create an additional auxiliary layer containing the distance to the closest (dist2rb) – polygon border for each extracted pixels. Defaults to False.

  • {bool} -- Create an additional auxiliary layer containing the distance to the closest – raster border for each extracted pixels. Defaults to False.

  • {str} -- A template raster to be used for rasterizing the vectorfile. (src_raster_template) – Usually the first element of src_raster. (default: {None})

  • {int} -- Numeric GDAL data type, defaults to 4 which is UInt32. (gdal_dtype) – See https://github.com/mapbox/rasterio/blob/master/rasterio/dtypes.py for useful look-up tables.

  • {int} -- Number of parallel processors to be use for extraction. -1 uses all processors. (n_jobs) – Defaults to 1.

Returns

[int] – If successful the function returns 0 as an exit code and 1 otherwise.

eobox.raster.extraction.get_paths_of_extracted(src_dir: str, patterns='*.npy', sort=True)[source]

Get the paths of extracted features. Used in load_extracted.

eobox.raster.extraction.load_extracted(src_dir: str, patterns='*.npy', vars_in_cols: bool = True, index: Optional[pandas.core.series.Series] = None, head: bool = False, sort: bool = True)[source]

Load data extracted and stored by extract()

Parameters

{str} -- The directory where the data is stored. (src_dir) –

Keyword Arguments
  • {str, or list of str} -- A pattern (patterns) – to identify the variables to be loaded. The default loads all variables, i.e. all .npy files. (default: {‘*.npy’})

  • {bool} -- Return the variables in columns (vars_in_cols) – (default: {True})

  • {pd.Series} -- A boolean pandas Series which indicates with True which samples to (index) – load.

  • {bool} -- Get a dataframe with the first five samples. (head) –

Returns

pandas.DataFrame – A dataframe with the data.

eobox.raster.extraction.load_extracted_dask(npy_path_list, index=None)[source]

Create a dask dataframe from a list of single features npy paths to be concatenated along the columns.

eobox.raster.extraction.load_extracted_partitions(src_dir: dict, patterns='*.npy', index: Optional[dict] = None, to_crs: Optional[dict] = None, verbosity=0)[source]

Load multiple row-wise appended partitions (same columns) with load_extracted().

:param src_dir {dict of str or Path} – Multiple src_dir as in load_extracted(): wrapped in a dictionary where the keys are the partition identifiers.

The key will be written as column in the returning dataframe.

:keyword patterns {str, or list of str} – See load_extracted().: :keyword index {dict pd.Series} – See load_extracted() but as dict as in src_dir.:

Returns

pandas.DataFrame – A dataframe with the data.

eobox.raster.extraction.load_extracted_partitions_dask(src_dir: dict, global_index_col: str, patterns='*.npy', verbosity=0)[source]

Load multiple row-wise appended partitions (same columns) with load_extracted() as dask dataframe.

:param src_dir {dict of str or Path} – Multiple src_dir as in load_extracted(): wrapped in a dictionary where the keys are the partition identifiers.

The key will be written as column in the returning dataframe.

Keyword Arguments

{str} (global_index_col) – – One of the columns matched by the patterns should be a global index, i.e. a index where each element is unique over all partitions.

:keyword patterns {str, or list of str} – See load_extracted().:

Returns

dask.dataframe.core.DataFrame – A dask dataframe with the data.