Get Landsat Time Series (lsts) Sample Data

In this notebook we download the Forest in Colorado, USA (P035-R032) dataset from Chris Holdens landsat_stack repository and save part of the data as single layer TIFF files.

Specifically, we run through the following steps:

  • We download and extract the data.

  • We save the RED, NIR and SWIR1 bands and the FMASK band from the years 2008 to 2013 as single layer TIFF files.

  • We delete the stack.

  • We create a dataframe with all .tif files.

The following is a lists of all Landsat / Fmask bands we download. The ones that are not striked out are the ones we keep.

  • Band 1 SR (SR * 10000)

  • Band 2 SR (SR * 10000)

  • Band 3 SR (SR * 10000)

  • Band 4 SR (SR * 10000)

  • Band 5 SR (SR * 10000)

  • Band 7 SR (SR * 10000)

  • Band 6 Thermal Brightness (C * 100)

  • Fmask

    • 0 - clear land

    • 1 - clear water

    • 2 - cloud

    • 3 - snow

    • 4 - shadow

    • 255 - NoData

The dataset is a sample dataset included under the name lsts (Landsat time series) in the eotools-dataset package.

First, download and extract the data.

[1]:
! wget http://ftp-earth.bu.edu/public/ceholden/landsat_stacks/p035r032.tar.bz2
!tar xf p035r032.tar.bz2
--2019-02-03 13:51:17--  http://ftp-earth.bu.edu/public/ceholden/landsat_stacks/p035r032.tar.bz2
Resolving ftp-earth.bu.edu (ftp-earth.bu.edu)... 128.197.229.67
Connecting to ftp-earth.bu.edu (ftp-earth.bu.edu)|128.197.229.67|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11963073 (11M) [application/x-bzip2]
Saving to: ‘p035r032.tar.bz2’

p035r032.tar.bz2    100%[===================>]  11,41M   535KB/s    in 28s

2019-02-03 13:51:46 (418 KB/s) - ‘p035r032.tar.bz2’ saved [11963073/11963073]

Imports and the helper functions.

[3]:
import os
import pandas as pd
from pathlib import Path
import shutil
import subprocess

def save_as_single_layer_file(src_dir, overwrite=False, remove_stack=True):
    keep = [2, 3, 4, 7]
    band_names = ["b1", "b2", "b3", "b4", "b5", "b7", "b6", "fmask"]
    src = list(src_dir.glob("*gtif"))[0]
    for bindex, bname in enumerate(band_names):
        if bindex not in keep:
            continue

        dst_dir = sdir.parent.parent / src_dir.stem
        dst_dir.mkdir(exist_ok=True)
        dst = dst_dir / f"{src_dir.stem.split('_')[0]}_{band_names[bindex]}.tif"
        if (not dst.exists() or overwrite):
            ot = "Byte" if bname == "fmask" else "Int16"
            exit_code = subprocess.check_call(
                f"gdal_translate -ot {ot} -b {bindex+1} -co COMPRESS=DEFLATE {str(src)} {str(dst)}",
                shell=True)

Save the selected bands and fmask of the selected years as single layer files.

[4]:
bdir_scenes_single = Path("./p035r032")
bdir_scenes = Path("./p035r032/images")
scene_dirs = list(bdir_scenes.glob("L*"))

counter = 0
for i, sdir in enumerate(scene_dirs):
    if int(sdir.stem[9:13]) < 2008:
        continue
    counter += 1
    print(f"{counter} / {len(scene_dirs)} - {sdir}")
    save_as_single_layer_file(src_dir=sdir, overwrite=False)
1 / 447 - p035r032/images/LE70350322009312EDC00
2 / 447 - p035r032/images/LE70350322010283EDC00
3 / 447 - p035r032/images/LE70350322011110EDC00
4 / 447 - p035r032/images/LT50350322011278PAC01
5 / 447 - p035r032/images/LE70350322008262EDC00
6 / 447 - p035r032/images/LT50350322009256PAC01
7 / 447 - p035r032/images/LT50350322009128PAC01
8 / 447 - p035r032/images/LT50350322009208PAC01
9 / 447 - p035r032/images/LT50350322011246PAC01
10 / 447 - p035r032/images/LE70350322008214EDC00
11 / 447 - p035r032/images/LE70350322009072EDC00
12 / 447 - p035r032/images/LE70350322012257EDC01
13 / 447 - p035r032/images/LE70350322012241EDC00
14 / 447 - p035r032/images/LE70350322009248EDC00
15 / 447 - p035r032/images/LE70350322008198EDC00
16 / 447 - p035r032/images/LT50350322010147PAC01
17 / 447 - p035r032/images/LT50350322008190PAC01
18 / 447 - p035r032/images/LE70350322012113EDC00
19 / 447 - p035r032/images/LE70350322011238EDC00
20 / 447 - p035r032/images/LE70350322011190EDC00
21 / 447 - p035r032/images/LE70350322012081EDC00
22 / 447 - p035r032/images/LE70350322009120EDC00
23 / 447 - p035r032/images/LE70350322011174EDC00
24 / 447 - p035r032/images/LE70350322009088EDC00
25 / 447 - p035r032/images/LT50350322010259PAC02
26 / 447 - p035r032/images/LT50350322008174PAC01
27 / 447 - p035r032/images/LE70350322012209EDC00
28 / 447 - p035r032/images/LE70350322012337EDC00
29 / 447 - p035r032/images/LT50350322011262PAC01
30 / 447 - p035r032/images/LT50350322009240PAC02
31 / 447 - p035r032/images/LE70350322008118EDC00
32 / 447 - p035r032/images/LT50350322011294PAC01
33 / 447 - p035r032/images/LT50350322008126PAC01
34 / 447 - p035r032/images/LT50350322008270PAC01
35 / 447 - p035r032/images/LE70350322010267EDC00
36 / 447 - p035r032/images/LE70350322012273EDC00
37 / 447 - p035r032/images/LE70350322010187EDC01
38 / 447 - p035r032/images/LE70350322008326EDC00
39 / 447 - p035r032/images/LT50350322010179EDC00
40 / 447 - p035r032/images/LT50350322008222PAC01
41 / 447 - p035r032/images/LT50350322009288PAC01
42 / 447 - p035r032/images/LE70350322011206EDC00
43 / 447 - p035r032/images/LE70350322010155EDC00
44 / 447 - p035r032/images/LT50350322009272PAC01
45 / 447 - p035r032/images/LE70350322012129EDC00
46 / 447 - p035r032/images/LE70350322010219EDC03
47 / 447 - p035r032/images/LE70350322013131EDC00
48 / 447 - p035r032/images/LE70350322012193EDC00
49 / 447 - p035r032/images/LE70350322012289EDC00
50 / 447 - p035r032/images/LE70350322012161EDC00
51 / 447 - p035r032/images/LT50350322011118PAC01
52 / 447 - p035r032/images/LT50350322009224PAC01
53 / 447 - p035r032/images/LT50350322008110PAC01
54 / 447 - p035r032/images/LE70350322009216EDC00
55 / 447 - p035r032/images/LE70350322008342EDC00
56 / 447 - p035r032/images/LT50350322010275PAC01
57 / 447 - p035r032/images/LE70350322010203EDC00
58 / 447 - p035r032/images/LE70350322009232EDC00
59 / 447 - p035r032/images/LT50350322010211EDC00
60 / 447 - p035r032/images/LE70350322011270EDC00
61 / 447 - p035r032/images/LT50350322010195EDC00
62 / 447 - p035r032/images/LE70350322008246EDC00
63 / 447 - p035r032/images/LT50350322011134PAC01
64 / 447 - p035r032/images/LT50350322011166PAC01
65 / 447 - p035r032/images/LE70350322012145EDC00
66 / 447 - p035r032/images/LT50350322009112PAC01
67 / 447 - p035r032/images/LT50350322009176PAC01
68 / 447 - p035r032/images/LE70350322012225EDC00
69 / 447 - p035r032/images/LT50350322008302PAC01
70 / 447 - p035r032/images/LT50350322011182PAC01
71 / 447 - p035r032/images/LT50350322011230PAC01
72 / 447 - p035r032/images/LT50350322009192PAC01
73 / 447 - p035r032/images/LT50350322008206PAC01
74 / 447 - p035r032/images/LE70350322009296EDC00
75 / 447 - p035r032/images/LE70350322013147EDC00
76 / 447 - p035r032/images/LT50350322008158PAC01
77 / 447 - p035r032/images/LT50350322008238PAC01
78 / 447 - p035r032/images/LE70350322009136EDC00
79 / 447 - p035r032/images/LT50350322010099PAC01
80 / 447 - p035r032/images/LT50350322008142PAC01
81 / 447 - p035r032/images/LE70350322012305EDC00
82 / 447 - p035r032/images/LT50350322009160PAC01
83 / 447 - p035r032/images/LE70350322010235EDC00
84 / 447 - p035r032/images/LE70350322012177EDC00
85 / 447 - p035r032/images/LE70350322010171EDC00
86 / 447 - p035r032/images/LE70350322011142EDC00
87 / 447 - p035r032/images/LE70350322011286EDC00
88 / 447 - p035r032/images/LE70350322008166EDC00
89 / 447 - p035r032/images/LT50350322008286PAC01
90 / 447 - p035r032/images/LE70350322008230EDC00
91 / 447 - p035r032/images/LE70350322012321EDC00
92 / 447 - p035r032/images/LE70350322011254EDC00
93 / 447 - p035r032/images/LE70350322013115EDC00
94 / 447 - p035r032/images/LE70350322011158EDC00
95 / 447 - p035r032/images/LE70350322009184EDC00
96 / 447 - p035r032/images/LT50350322011214PAC01
97 / 447 - p035r032/images/LT50350322011198PAC01
98 / 447 - p035r032/images/LE70350322012097EDC00
99 / 447 - p035r032/images/LT50350322010243EDC00
100 / 447 - p035r032/images/LE70350322009200EDC00
101 / 447 - p035r032/images/LT50350322010307EDC00
102 / 447 - p035r032/images/LT50350322010227EDC00
103 / 447 - p035r032/images/LE70350322011222EDC00
104 / 447 - p035r032/images/LE70350322008150EDC00
105 / 447 - p035r032/images/LE70350322008182EDC00

Delete the image directory which contained the complete downloaded data.

[6]:
shutil.rmtree(bdir_scenes)

Lets derive the paths and some thereof derived metadata and put the info in a dataframe.

[8]:
layers_paths = list(Path(bdir_scenes_single).rglob("*.tif"))
layers_df = pd.Series([p.stem for p in layers_paths]).str.split("_", expand=True) \
    .rename({0: "sceneid", 1:"band"}, axis=1)
layers_df["date"] = pd.to_datetime(layers_df.sceneid.str[9:16], format="%Y%j")
layers_df["path"] = layers_paths
layers_df = layers_df.sort_values(["date", "band"])
layers_df = layers_df.reset_index(drop=True)
layers_df.head(10)
[8]:
sceneid band date path
0 LT50350322008110PAC01 b3 2008-04-19 p035r032/LT50350322008110PAC01/LT5035032200811...
1 LT50350322008110PAC01 b4 2008-04-19 p035r032/LT50350322008110PAC01/LT5035032200811...
2 LT50350322008110PAC01 b5 2008-04-19 p035r032/LT50350322008110PAC01/LT5035032200811...
3 LT50350322008110PAC01 fmask 2008-04-19 p035r032/LT50350322008110PAC01/LT5035032200811...
4 LE70350322008118EDC00 b3 2008-04-27 p035r032/LE70350322008118EDC00/LE7035032200811...
5 LE70350322008118EDC00 b4 2008-04-27 p035r032/LE70350322008118EDC00/LE7035032200811...
6 LE70350322008118EDC00 b5 2008-04-27 p035r032/LE70350322008118EDC00/LE7035032200811...
7 LE70350322008118EDC00 fmask 2008-04-27 p035r032/LE70350322008118EDC00/LE7035032200811...
8 LT50350322008126PAC01 b3 2008-05-05 p035r032/LT50350322008126PAC01/LT5035032200812...
9 LT50350322008126PAC01 b4 2008-05-05 p035r032/LT50350322008126PAC01/LT5035032200812...

Reformat the data such that we can check if some of the scenes have missing bands.

[9]:
counts_bands_per_sceneid = layers_df[["sceneid", "band", "path"]] \
    .pivot_table(index="sceneid", columns="band", aggfunc="count")
display(counts_bands_per_sceneid.head(2))
display(counts_bands_per_sceneid.tail(2))
counts_bands_per_sceneid.apply("sum", axis=0)
path
band b3 b4 b5 fmask
sceneid
LE70350322008118EDC00 1 1 1 1
LE70350322008150EDC00 1 1 1 1
path
band b3 b4 b5 fmask
sceneid
LT50350322011278PAC01 1 1 1 1
LT50350322011294PAC01 1 1 1 1
[9]:
      band
path  b3       105
      b4       105
      b5       105
      fmask    105
dtype: int64

Which is not the case (;

The End