HYDRO1k Documentation

Table of Contents

1.0. Introduction
2.0. Data Layers
3.0. Data Set Development
3.1. Data Processing Procedures
3.1.1. Project the DEM
3.1.2. Identify Natural Sink Features
3.1.3. Filling the DEM
3.1.4. Verification of the DEM
3.2. Generation of Derivative Raster Data Sets
3.2.1. Aspect
3.2.2. Flow Directions
3.2.3. Flow Accumulations
3.2.4. Slope
3.2.5. Shaded Relief Representation
3.3. Generation of Derivative Vector Data Sets
3.3.1. Stream Lines
3.3.2. Drainage Basin Boundaries
4.0. Data Formats
4.1. Vector Data Formats
4.2. Raster Data Formats
4.2.1. Image File (.bil)
4.2.2. Header File (.hdr)
4.2.3. World File (.blw)
4.2.4. Statistics File (.sta)
5.0. Data Distribution
6.0. Notes and Hints for HYDRO1k Users
7.0. Summary
8.0. References
9.0. Disclaimers



1.0. Introduction

HYDRO1k, developed at the U.S. Geological Survey’s (USGS) EROS Data Center, is a geographic database providing comprehensive and consistent global coverage of topographically derived data sets. Developed from the USGS' recently released 30 arc-second digital elevation model (DEM) of the world (GTOPO30), HYDRO1k provides a standard suite of geo-referenced data sets (at a resolution of 1 km) that will be of value for all users who need to organize, evaluate, or process hydrologic information on a continental scale.

Constructive comments from users of the HYDRO1k data sets are welcomed. Please send your comments to kverdin@edcmail.cr.usgs.gov or sgreenlee@edcmail.cr.usgs.gov.

2.0. Data Layers

The HYDRO1k data sets are being developed on a continent by continent basis, for all landmasses of the globe with the exception of Antarctica and Greenland. The HYDRO1k package provides, for each continent, a suite of six raster and two vector data sets. These data sets cover many of the common derivative products used in hydrologic analysis. The raster data sets are the hydrologically correct digital elevation model (DEM) and corresponding shaded relief rendition, derived flow directions, flow accumulations, slope and aspect. The derived streamlines and basins are distributed as vector data sets.

3.0. Data Set Development

The HYDRO1k data sets are the result of the cooperative project at the U.S. Geological Survey’s (U.S.G.S.) EROS Data Center. The goal of the project is the development of a globally consistent hydrologic derivative data set. The effort has been led by U.S.G.S. scientists with additional staffing from the United Nations Environment Programme/Global Resource Information Database (UNEP/GRID) located in Sioux Falls, South Dakota.

Development of the HYDRO1k database was made possible by the completion of the 30 arc-second digital elevation model at the EROS Data Center in 1996, entitled GTOPO30. This data set, with its nominal cell size of 1 km, has been and will continue to be applied by many scientists and researchers to hydrologic and land form studies. Inevitably, these studies require development, at a minimum, of a standard suite of derivative products. In the past, users would obtain the DEM data, process the data, extract the derivative information, use the derived products in their studies and, perhaps, share the derived information with others. In an attempt to reduce repetition of these procedures by every user of the data set, the HYDRO1k data base aims to provide these standard products, developed in a consistent fashion for the entire globe and make them available for the entire user community.

3.1 Data Processing Procedures

The basis of all of the data layers available in the HYDRO1k database is the hydrologically correct DEM. This DEM is, of course, based on the GTOPO30 data set. However, to ensure that the DEM is able to reproduce the correct movement of water across its surface, the DEM is processed to remove elevation anomalies that can interfere with hydrologically correct flow. The procedures followed in development of this DEM are iterative. Some of the techniques used in the DEM development are documented in Danielson (1997).

3.1.1. Project the DEM

In order to properly perform area calculations on the DEM, the data are projected into an equal area projection. The Lambert Azimuthal Equal Area projection was selected for this database. (Steinwand et al, 1995). T he cell size for all continents is 1,000 meters and the radius of the sphere of influence is 6,370,997 meters. Projection parameters that vary by continent are given in the following table. Other geo-referencing information is available in the projection file that is included with each continental data set.

ContinentLongitude of OriginLatitude of Origin
Africa20° 00' 00"E5° 00' 00"N
Asia100° 00' 00"E45° 00' 00"N
Australasia135° 00' 00"E15° 00' 00"S
Europe20° 00' 00"E55° 00' 00"N
North America100° 00' 00"W45° 00' 00"N
South America60° 00' 00"W15° 00' 00"S

3.1.2. Identify Natural Sink Features

All continents contain some closed basins; drainage basins with no natural outlet to the sea. In processing the HYDRO1k DEM to replicate natural flow patterns, techniques were developed to (1). identify which sink features in the DEM are, indeed, natural features and (2). preserve these sink features during the processing. Identification of the natural sinks in the DEM was begun by creating a "sink layer" containing all sink features contained in the projected GTOPO30 DEM. This sink layer was then thresholded to extract only sinks with a surface area greater than a specified minimum. This was used as a "first-cut" on identification of the natural sink features.

3.1.3. Filling the DEM

To allow filling of the DEM using standard GIS techniques while still maintaining the sinks identified in step 3.1.2., the identified sinks are "seeded" by placing a NODATA point at the bottom of each sink. Since the standard GIS implementation of the hydrologic filling technique allows flow only off the edge of the DEM or to NODATA points, this procedure "tricks" the GIS into letting water flow to the sink. All spurious sinks, those not identified as potential natural features in 3.1.2, are removed.

3.1.4. Verification of the DEM

Following filling of the DEM, initial streamline and basin data sets are generated for use in the verification of the DEM. Flow direction and flow accumulation grids are generated and the vector stream lines and basin boundaries are produced. The streamlines and basins thus derived are compared against existing digital data. In most cases, the Digital Chart of the World (DCW) drainage cover was used for comparison (Defense Mapping Agency, 1992; Danko, 1992). However, all available map sources were used. Comparison of the generated streamlines with mapped hydrography allows identification of essentially two types of errors in the DEM:

(1). Errors of omission or inclusion of natural sink features. Examination of mapped hydrography often serves to identify whether or not the first pass identification of the natural sinks features was adequate. In the case of an error of omission, the newly identified sink feature is "seeded" in the DEM and in the case of inclusion, the "seeded" sink is removed ("unseeded").

(2). Errors in the DEM which prevent proper flow across its surface. These errors can be caused by the DEM generation or resampling techniques or can simply be caused by the 1-km horizontal or the 1-m vertical resolution of the DEM. Comparison with mapped hydrography serves to identify locations where the generated streamlines or basin boundary deviate. If the difference between the two sources of information proves to be the DEM, editing of the DEM is done to guarantee that flow progresses in the required direction. These type of DEM edits usually involve only small changes in the elevation of one or two pixels.

The procedures in 3.1.3. and 3.1.4. are repeated until the DEM is able to produce streamlines and basins that adequately match mapped hydrography.

3.2. Generation of Derivative Raster Data Sets

Following generation of the hydrologically correct DEM, the final versions of the additional derivative data layers are produced. Along with the hydrologically correct DEM, the following five raster data layers are developed using standard GIS techniques. All derivative raster data layers were produced using ARC/INFO’s GRID module (ESRI, 1992).

3.2.1. Aspect

The aspect data set describes the direction of maximum rate of change in the elevations between each cell and its eight neighbors. It can essentially be thought of as the slope direction. It is measured in positive integer degrees from 0 to 360, measured clockwise from north. Aspects of cells of zero slope (flat areas) are assigned values of -1.

3.2.2. Flow Directions

The flow direction data layer defines the direction of flow from each cell in the DEM to its steepest down-slope neighbor. Values of flow direction vary from 1 to 255. Defined flow directions follow the convention adopted by ARC/INFO's flow direction implementation:

32

64

128

16

1

8

4

2

Cells with undefined direction of flow represent sinks and have flow directions that are simple combinations of its neighbors' flow direction values.

3.2.3. Flow Accumulations

The flow accumulation data layer defines the amount of upstream area draining into each cell. It is essentially a measure of the upstream catchment area. The flow direction layer is used to define which cells flow into the target cell. Since the cell size of the HYDRO1k data set is 1 km, the flow accumulation value translates directly into drainage areas in square kilometers. Values range from 0 at topographic highs to very large numbers (on the order of millions of cells) at the mouths of large rivers.

3.2.4. Slope

The slope data layer describes the maximum change in the elevations between each cell and its eight neighbors. The slope is expressed in integer degrees of slope between 0 and 90.

3.2.5. Shaded Relief Representation

The shaded relief representation of the DEM was generated using the Slope-Aspect Index (SAI function) with a vertical exaggeration of 15. The values of the shaded relief representation vary from 0 to 255.

3.3 Generation of Derivative Vector Data Sets

The stream line and basin data in the HYDRO1k data set are distributed as vector layers.

3.3.1. Stream Lines

The stream line data layer distributed with the HYDRO1k data set is derived from the flow accumulation and flow direction layers. Cells with upstream drainage areas greater than 1000 km2 are selected from the flow accumulation layer and processed through the STREAMLINK function. The resulting links are attributed with the maximum flow accumulation occurring within that link and the result is vectorized using the STREAMLINE function. These procedures result in a vector data layer of streamlines with each segment of stream attributed with the upstream contributing drainage area. The vector streamlines are attributed with the following fields:

Flowacc = The flow accumulation value of the to-nodes of the stream segment (10-3 km2). This value corresponds directly with the watershed contributing area upstream of the to-node.

Level1 to Level6 = The Pfafstetter units in which the stream segment lie. Depending on the density of the Pfafstetter subdivision (Africa has six levels while North America has five), one field will exist for each level.

Stream_type1 to Stream_type6 = Streams corresponding to main-stem streams at each level of Pfafstetter subdivision are tagged with a one (‘1’) in the appropriate stream_type field. For example, mainstem streams from Pfafstetter Level 1 would have stream_type1 = 1; mainstem streams corresponding to Pfafstetter Level 2 would have stream_type2 = 1.

3.3.2. Drainage Basin Boundaries

The drainage basins distributed with the HYDRO1k data set are derived using the vector streamlines along with the flow direction layer. The basins are seeded following procedures first articulated by Otto Pfafstetter, a Brazilian engineer, and adapted for use in the HYDRO1k data set (Verdin, 1997). Each polygon in the basin data set has been tagged with a Pfafstetter code uniquely identifying each sub-basin. The five or six digit Pfafstetter codes assigned to each basin carry basin linkage information. This permits determination of basin interconnectedness through simple examination of the Pfafstetter code. A complete description of the Pfafstetter system and its usefulness can be found in Verdin & Verdin.

The drainage basin boundaries polygons are attributed with the following attributes:

Level_1 to Level_6 = Pfafstetter units of each polygon

Level_1_Name to Level_6_Name = Drainage basin names

4.0.Data Formats

4.1. Vector Data Formats

The vector data sets, stream lines and basins, distributed with HYDRO1k are being made available in a ARC/INFO Export Format (.E00 extension).

4.2. Raster Data Formats

The six raster data layers distributed for each continent are being distributed as simple binary raster data. Each raster data layer is provided as four files, with the extension of each file defining the file type.

File ExtensionFile Type
.bilRaster Data File
.hdrHeader File
.blwWorld File
.staStatistics File

4.2.1. Image File (.bil)

The raster data for each layer is provided as 16-bit signed integer data in a simple binary raster format. There are no header or trailer bytes embedded in the image. The data are stored in row major order (all the data for row 1, followed by all the data for row 2, etc.).

4.2.2. Header File (.hdr)

The raster data header file is an ASCII text file containing size and coordinate information for the layer. Many standard software packages require the .hdr file to provide important geo-referencing information for the image. The following keywords are used in the header file:

BYTEORDER:Byte order in which image pixel values are stored
M = Motorola byte order (most significant byte first)
LAYOUT: organization of the bands in the file
BIL: band interleaved by line (note: the raster layers are all single band images)
NROWS: number of rows in the image
NCOLS: number of columns in the image
NBANDS: number of spectral bands in the image (1)
NBITS: number of bits per pixel (16)
BANDROWBYTES: number of bytes per band per row (twice the number of columns for a 16-bit image)
TOTALROWBYTES: total number of bytes of data per row (twice the number of columns for a single band 16-bit image)
BANDGAPBYTES: the number of bytes between bands in a BSQ format image (0)
NODATA: value used for masking purposes (-9999)
ULXMAP: location of the center of the upper-left pixel (meters in Lambert Azimuth Equal Area Projection)
ULYMAP: location of the center of the upper-left pixel (meters)
XDIM: x dimension of a pixel (meters)
YDIM: y dimension of a pixel (meters)

4.2.3. World File (.blw file)

The world file is an ASCII text file containing coordinate information. It is used by some packages for geo-referencing of image data.

4.2.4. Statistics File (.sta file)

The statistics file is an ASCII text file that lists the band number, minimum value, maximum value, mean value, and standard deviation of the values in the raster data file.

5.0. Data Distribution

HYDRO1k data for each continent are distributed electronically as tar files. The data files are identified by the two-digit continental identifier according to the following scheme:

Two-digit IdentifierContinent
AFAfrica
ASAsia
AUAustralasia
EUEurope
NANorth America
SASouth America

Users have the option of obtaining the entire HYDRO1k data set for a continent (all eight data layers) or to selectively choose which of the layers they want. In the case of the entire data set, all data layers (with the BIL files compressed using the gzip command) have been combined into one file with the Unix "tar" command. For the individual data layers, the raster data sets (compressed image data and ancillary files) have been combined into a single file with the tar function. The vector data sets are available as compressed (gzip) export files. As an example of the naming convention used, the North American data sets that are available are:

Na.tarA tar file containing all the North American data layers
Na_asp.tarTar file containing the aspect data layer (compressed bil file and three ancillary files)
Na_bas.e00.gzVector basin data layer in compressed ARC/INFO Export format
Na_dem.tarTar file with DEM data layer (compressed bil file and three ancillary files)
Na_fd.tarTar file with flow direction data layer
Na_fa.tarTar file with flow accumulation data layer
Na_slope.tarTar file with slope data layer
Na_sr.tarTar file with shaded relief data layer
Na_str.e00.gzVector streams data layer in compressed ARC/INFO Export format

As well as being available via a web page interface, the HYDRO1k data sets are available electronically through an Internet anonymous File Transfer Protocol (FTP) account at the EROS Data Center (at no cost).

To access this account:

1. FTP to edcftp.cr.usgs.gov
2. Enter anonymous at the Name prompt.
3. Enter your email address at the Password prompt.
4. Change to the /pub/data/gtopo30hydro subdirectory
5. Enter binary to set the transfer type.
6. Use get or mget to retrieve the desired files.

For assistance and information contact:

EDC DAAC User Services
EROS Data Center
Sioux Falls, SD 57198 USA
Tel: 605-594-6116 (7:30 am to 4:00 pm CT)
Fax: 605-594-6963 (24 hours)
Internet: edc@eos.nasa.gov (24 hours)

To use the HYDRO1k data files, the individual data files must first be extracted from the tar files. Within the tar files, the image data files (.bil) are compressed. These files, along with the compressed vector export files, must be uncompressed. If you do not have the gzip and tar utilities, they can be obtained from the following locations:

Unix gzip:
ftp://prep.ai.mit.edu/pub/gnu
ftp://wuarchive.wustl.edu/systems/gnu
Macintosh gzip and tar:
ftp://mirrors.aol.com/pub/mac/util/compression
macgzip0.3b2.sit.hqx
suntar2.03.cpt.hqx
DOS gzip and tar:
ftp://prep.ai.mit.edu/pub/gnu
gzip-1.2.4.tar
ftp://ftp.uu.net/systems/ibmpc/msdos/pcroute
tar.exe

6.0. Notes and Hints for HYDRO1k Users

Because the image (.bil) data are stored in a 16-bit binary format, users must be aware of how the bytes are addressed on their computers. The data are provided in Motorola byte order, which stores the most significant byte first ("big endian"). Systems such as Sun SPARC and Silicon Graphics workstations use the Motorola byte order. The Intel byte order, which stores the least significant byte first ("little endian"), is used on DEC Alpha systems and most PCs. Users with systems that address bytes in the Intel byte order may have to "swap bytes" of the BIL data unless their application software performs the conversion during ingest. The statistics file (.stx) provided for each data set gives the range of values in the image file, so that users can check if they have the correct values stored on their system.

Users of ARC/INFO or ArcView can display the image data directly. However, if a user needs access to the actual pixel values for analysis in ARC/INFO the image must be converted to an ARC/INFO grid with the command IMAGEGRID. IMAGEGRID does not support conversion of signed image data, therefore the negative 16-bit image values will not be interpreted correctly. After running IMAGEGRID, an easy fix can be accomplished using the following formula in GRID:

out_grid = con(in_grid >= 32768, in_grid - 65536, in_grid)

The converted grid will then have the negative values properly represented, and the statistics of the grid should match those listed in the .stx file. If desired, the -9999 ocean mask values in the grid could then be set to NODATA with the SETNULL function.

7.0. Summary

The HYDRO1k data set provides many of the derivative products useful in earth science applications. The hydrologically correct DEM and ancillary data layers are useful in studies of earth systems including watershed analysis, landform studies and global change scenarios. Development of a standard set of data layers minimizes duplication of effort and will provide consistent global coverage.

8.0 References

Danielson, J.J., 1996. Delineation of drainage basins from 1 km African digital elevation data. In: Pecora Thirteen, Human Interactions with the Environment - Perspectives from Space, Sioux Falls, South Dakota, August 20-22, 1996.

Danko, D.M., 1992. The digital chart of the world. GeoInfo Systems, 2:29-36.

Defense Mapping Agency, 1992, Development of the Digital Chart of the World: Washington, D.C., U.S. Government Printing Office

ESRI, 1992, "Cell based modeling with GRID", ESRI, Inc., Redlands, California.

Steinwand, D.R., Hutchinson, J.A., and Snyder, J.P. ,1995, Map projections for global and continental data sets and an analysis of pixel distortion caused by reprojection: Photogrammetric Engineering and Remote Sensing, v. 61, p. 1,487-1,497.

Verdin, K.L., and Greenlee, S.K., 1996. Development of continental scale digital elevation models and extraction of hydrographic features. In: Proceedings, Third International Conference/Workshop on Integrating GIS and Environmental Modeling, Santa Fe, New Mexico, January 21-26, 1996. National Center for Geographic Information and Analysis, Santa Barbara, California.

Verdin, K.L., A System for Topologically Coding Global Drainage Basins and Stream Networks. In: Proceedings, 17th Annual ESRI Users Conference, San Diego, California, July 1997.

9.0 Disclaimers

Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government. Please note that some U.S. Geological Survey (USGS) information contained in this data set and documentation may be preliminary in nature and presented prior to final review and approval by the Director of the USGS. This information is provided with the understanding that it is not guaranteed to be correct or complete and conclusions drawn from such information are the sole responsibility of the user.