Get Landsat Time Series (lsts) Sample Data¶
In this notebook we download the Forest in Colorado, USA (P035-R032) dataset from Chris Holdens landsat_stack repository and save part of the data as single layer TIFF files.
Specifically, we run through the following steps:
We download and extract the data.
We save the RED, NIR and SWIR1 bands and the FMASK band from the years 2008 to 2013 as single layer TIFF files.
We delete the stack.
We create a dataframe with all .tif files.
The following is a lists of all Landsat / Fmask bands we download. The ones that are not striked out are the ones we keep.
Band 1 SR (SR * 10000)
Band 2 SR (SR * 10000)
Band 3 SR (SR * 10000)
Band 4 SR (SR * 10000)
Band 5 SR (SR * 10000)
Band 7 SR (SR * 10000)
Band 6 Thermal Brightness (C * 100)
Fmask
0 - clear land
1 - clear water
2 - cloud
3 - snow
4 - shadow
255 - NoData
The dataset is a sample dataset included under the name lsts (Landsat time series) in the eotools-dataset
package.
First, download and extract the data.
[1]:
! wget http://ftp-earth.bu.edu/public/ceholden/landsat_stacks/p035r032.tar.bz2
!tar xf p035r032.tar.bz2
--2019-02-03 13:51:17-- http://ftp-earth.bu.edu/public/ceholden/landsat_stacks/p035r032.tar.bz2
Resolving ftp-earth.bu.edu (ftp-earth.bu.edu)... 128.197.229.67
Connecting to ftp-earth.bu.edu (ftp-earth.bu.edu)|128.197.229.67|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11963073 (11M) [application/x-bzip2]
Saving to: ‘p035r032.tar.bz2’
p035r032.tar.bz2 100%[===================>] 11,41M 535KB/s in 28s
2019-02-03 13:51:46 (418 KB/s) - ‘p035r032.tar.bz2’ saved [11963073/11963073]
Imports and the helper functions.
[3]:
import os
import pandas as pd
from pathlib import Path
import shutil
import subprocess
def save_as_single_layer_file(src_dir, overwrite=False, remove_stack=True):
keep = [2, 3, 4, 7]
band_names = ["b1", "b2", "b3", "b4", "b5", "b7", "b6", "fmask"]
src = list(src_dir.glob("*gtif"))[0]
for bindex, bname in enumerate(band_names):
if bindex not in keep:
continue
dst_dir = sdir.parent.parent / src_dir.stem
dst_dir.mkdir(exist_ok=True)
dst = dst_dir / f"{src_dir.stem.split('_')[0]}_{band_names[bindex]}.tif"
if (not dst.exists() or overwrite):
ot = "Byte" if bname == "fmask" else "Int16"
exit_code = subprocess.check_call(
f"gdal_translate -ot {ot} -b {bindex+1} -co COMPRESS=DEFLATE {str(src)} {str(dst)}",
shell=True)
Save the selected bands and fmask of the selected years as single layer files.
[4]:
bdir_scenes_single = Path("./p035r032")
bdir_scenes = Path("./p035r032/images")
scene_dirs = list(bdir_scenes.glob("L*"))
counter = 0
for i, sdir in enumerate(scene_dirs):
if int(sdir.stem[9:13]) < 2008:
continue
counter += 1
print(f"{counter} / {len(scene_dirs)} - {sdir}")
save_as_single_layer_file(src_dir=sdir, overwrite=False)
1 / 447 - p035r032/images/LE70350322009312EDC00
2 / 447 - p035r032/images/LE70350322010283EDC00
3 / 447 - p035r032/images/LE70350322011110EDC00
4 / 447 - p035r032/images/LT50350322011278PAC01
5 / 447 - p035r032/images/LE70350322008262EDC00
6 / 447 - p035r032/images/LT50350322009256PAC01
7 / 447 - p035r032/images/LT50350322009128PAC01
8 / 447 - p035r032/images/LT50350322009208PAC01
9 / 447 - p035r032/images/LT50350322011246PAC01
10 / 447 - p035r032/images/LE70350322008214EDC00
11 / 447 - p035r032/images/LE70350322009072EDC00
12 / 447 - p035r032/images/LE70350322012257EDC01
13 / 447 - p035r032/images/LE70350322012241EDC00
14 / 447 - p035r032/images/LE70350322009248EDC00
15 / 447 - p035r032/images/LE70350322008198EDC00
16 / 447 - p035r032/images/LT50350322010147PAC01
17 / 447 - p035r032/images/LT50350322008190PAC01
18 / 447 - p035r032/images/LE70350322012113EDC00
19 / 447 - p035r032/images/LE70350322011238EDC00
20 / 447 - p035r032/images/LE70350322011190EDC00
21 / 447 - p035r032/images/LE70350322012081EDC00
22 / 447 - p035r032/images/LE70350322009120EDC00
23 / 447 - p035r032/images/LE70350322011174EDC00
24 / 447 - p035r032/images/LE70350322009088EDC00
25 / 447 - p035r032/images/LT50350322010259PAC02
26 / 447 - p035r032/images/LT50350322008174PAC01
27 / 447 - p035r032/images/LE70350322012209EDC00
28 / 447 - p035r032/images/LE70350322012337EDC00
29 / 447 - p035r032/images/LT50350322011262PAC01
30 / 447 - p035r032/images/LT50350322009240PAC02
31 / 447 - p035r032/images/LE70350322008118EDC00
32 / 447 - p035r032/images/LT50350322011294PAC01
33 / 447 - p035r032/images/LT50350322008126PAC01
34 / 447 - p035r032/images/LT50350322008270PAC01
35 / 447 - p035r032/images/LE70350322010267EDC00
36 / 447 - p035r032/images/LE70350322012273EDC00
37 / 447 - p035r032/images/LE70350322010187EDC01
38 / 447 - p035r032/images/LE70350322008326EDC00
39 / 447 - p035r032/images/LT50350322010179EDC00
40 / 447 - p035r032/images/LT50350322008222PAC01
41 / 447 - p035r032/images/LT50350322009288PAC01
42 / 447 - p035r032/images/LE70350322011206EDC00
43 / 447 - p035r032/images/LE70350322010155EDC00
44 / 447 - p035r032/images/LT50350322009272PAC01
45 / 447 - p035r032/images/LE70350322012129EDC00
46 / 447 - p035r032/images/LE70350322010219EDC03
47 / 447 - p035r032/images/LE70350322013131EDC00
48 / 447 - p035r032/images/LE70350322012193EDC00
49 / 447 - p035r032/images/LE70350322012289EDC00
50 / 447 - p035r032/images/LE70350322012161EDC00
51 / 447 - p035r032/images/LT50350322011118PAC01
52 / 447 - p035r032/images/LT50350322009224PAC01
53 / 447 - p035r032/images/LT50350322008110PAC01
54 / 447 - p035r032/images/LE70350322009216EDC00
55 / 447 - p035r032/images/LE70350322008342EDC00
56 / 447 - p035r032/images/LT50350322010275PAC01
57 / 447 - p035r032/images/LE70350322010203EDC00
58 / 447 - p035r032/images/LE70350322009232EDC00
59 / 447 - p035r032/images/LT50350322010211EDC00
60 / 447 - p035r032/images/LE70350322011270EDC00
61 / 447 - p035r032/images/LT50350322010195EDC00
62 / 447 - p035r032/images/LE70350322008246EDC00
63 / 447 - p035r032/images/LT50350322011134PAC01
64 / 447 - p035r032/images/LT50350322011166PAC01
65 / 447 - p035r032/images/LE70350322012145EDC00
66 / 447 - p035r032/images/LT50350322009112PAC01
67 / 447 - p035r032/images/LT50350322009176PAC01
68 / 447 - p035r032/images/LE70350322012225EDC00
69 / 447 - p035r032/images/LT50350322008302PAC01
70 / 447 - p035r032/images/LT50350322011182PAC01
71 / 447 - p035r032/images/LT50350322011230PAC01
72 / 447 - p035r032/images/LT50350322009192PAC01
73 / 447 - p035r032/images/LT50350322008206PAC01
74 / 447 - p035r032/images/LE70350322009296EDC00
75 / 447 - p035r032/images/LE70350322013147EDC00
76 / 447 - p035r032/images/LT50350322008158PAC01
77 / 447 - p035r032/images/LT50350322008238PAC01
78 / 447 - p035r032/images/LE70350322009136EDC00
79 / 447 - p035r032/images/LT50350322010099PAC01
80 / 447 - p035r032/images/LT50350322008142PAC01
81 / 447 - p035r032/images/LE70350322012305EDC00
82 / 447 - p035r032/images/LT50350322009160PAC01
83 / 447 - p035r032/images/LE70350322010235EDC00
84 / 447 - p035r032/images/LE70350322012177EDC00
85 / 447 - p035r032/images/LE70350322010171EDC00
86 / 447 - p035r032/images/LE70350322011142EDC00
87 / 447 - p035r032/images/LE70350322011286EDC00
88 / 447 - p035r032/images/LE70350322008166EDC00
89 / 447 - p035r032/images/LT50350322008286PAC01
90 / 447 - p035r032/images/LE70350322008230EDC00
91 / 447 - p035r032/images/LE70350322012321EDC00
92 / 447 - p035r032/images/LE70350322011254EDC00
93 / 447 - p035r032/images/LE70350322013115EDC00
94 / 447 - p035r032/images/LE70350322011158EDC00
95 / 447 - p035r032/images/LE70350322009184EDC00
96 / 447 - p035r032/images/LT50350322011214PAC01
97 / 447 - p035r032/images/LT50350322011198PAC01
98 / 447 - p035r032/images/LE70350322012097EDC00
99 / 447 - p035r032/images/LT50350322010243EDC00
100 / 447 - p035r032/images/LE70350322009200EDC00
101 / 447 - p035r032/images/LT50350322010307EDC00
102 / 447 - p035r032/images/LT50350322010227EDC00
103 / 447 - p035r032/images/LE70350322011222EDC00
104 / 447 - p035r032/images/LE70350322008150EDC00
105 / 447 - p035r032/images/LE70350322008182EDC00
Delete the image directory which contained the complete downloaded data.
[6]:
shutil.rmtree(bdir_scenes)
Lets derive the paths and some thereof derived metadata and put the info in a dataframe.
[8]:
layers_paths = list(Path(bdir_scenes_single).rglob("*.tif"))
layers_df = pd.Series([p.stem for p in layers_paths]).str.split("_", expand=True) \
.rename({0: "sceneid", 1:"band"}, axis=1)
layers_df["date"] = pd.to_datetime(layers_df.sceneid.str[9:16], format="%Y%j")
layers_df["path"] = layers_paths
layers_df = layers_df.sort_values(["date", "band"])
layers_df = layers_df.reset_index(drop=True)
layers_df.head(10)
[8]:
sceneid | band | date | path | |
---|---|---|---|---|
0 | LT50350322008110PAC01 | b3 | 2008-04-19 | p035r032/LT50350322008110PAC01/LT5035032200811... |
1 | LT50350322008110PAC01 | b4 | 2008-04-19 | p035r032/LT50350322008110PAC01/LT5035032200811... |
2 | LT50350322008110PAC01 | b5 | 2008-04-19 | p035r032/LT50350322008110PAC01/LT5035032200811... |
3 | LT50350322008110PAC01 | fmask | 2008-04-19 | p035r032/LT50350322008110PAC01/LT5035032200811... |
4 | LE70350322008118EDC00 | b3 | 2008-04-27 | p035r032/LE70350322008118EDC00/LE7035032200811... |
5 | LE70350322008118EDC00 | b4 | 2008-04-27 | p035r032/LE70350322008118EDC00/LE7035032200811... |
6 | LE70350322008118EDC00 | b5 | 2008-04-27 | p035r032/LE70350322008118EDC00/LE7035032200811... |
7 | LE70350322008118EDC00 | fmask | 2008-04-27 | p035r032/LE70350322008118EDC00/LE7035032200811... |
8 | LT50350322008126PAC01 | b3 | 2008-05-05 | p035r032/LT50350322008126PAC01/LT5035032200812... |
9 | LT50350322008126PAC01 | b4 | 2008-05-05 | p035r032/LT50350322008126PAC01/LT5035032200812... |
Reformat the data such that we can check if some of the scenes have missing bands.
[9]:
counts_bands_per_sceneid = layers_df[["sceneid", "band", "path"]] \
.pivot_table(index="sceneid", columns="band", aggfunc="count")
display(counts_bands_per_sceneid.head(2))
display(counts_bands_per_sceneid.tail(2))
counts_bands_per_sceneid.apply("sum", axis=0)
path | ||||
---|---|---|---|---|
band | b3 | b4 | b5 | fmask |
sceneid | ||||
LE70350322008118EDC00 | 1 | 1 | 1 | 1 |
LE70350322008150EDC00 | 1 | 1 | 1 | 1 |
path | ||||
---|---|---|---|---|
band | b3 | b4 | b5 | fmask |
sceneid | ||||
LT50350322011278PAC01 | 1 | 1 | 1 | 1 |
LT50350322011294PAC01 | 1 | 1 | 1 | 1 |
[9]:
band
path b3 105
b4 105
b5 105
fmask 105
dtype: int64
Which is not the case (;
The End