Esri Shapefile .SHP File Description

Esri Shapefiles are typically usually used in Surfer as base maps. Compressed (ZIP, TAR, TAR.GZ, TGZ) Shapefiles can be imported directly into Surfer. However if the compressed folder contains more than one Shapefile, only the first Shapefile is imported.

Esri Shapefiles are in a binary file format (i.e., they can't be created or modified with a text editor or word processor) that is compatible with Arc/Info, Arc/View, and other Esri application programs. This format is used to store spatial information including boundary objects such as areas, curves, and points. Spatial information is only concerned with the location of objects in space (i.e., their coordinates) and not with their attributes (such as line or fill style, marker symbol used, text labels, etc.).

Four types of files are produced with each export:

Filename Extension

Description

.SHP

Contains the coordinates of each object in the drawing.

.SHX

Contains the file offset of each object in the .SHP file.

.DBF

Contains the attribute text associated with each object in the .SHP file.

.CPG

Contains the Unicode code page number.

In each of the .SHP, .SHX, and .DBF files, the shapes in each file correspond to each other in sequence. That is, the first record in the .SHP file corresponds to the first record in the .SHX and .DBF files, and so on. The .SHP and .SHX files have various fields with different endianness, so as an implementor of the file formats you must be very careful to respect the endianness of each field and treat it properly.

Overview

A shapefile is a digital vector storage format for storing geometric location and associated attribute information. This format lacks the capacity to store topological information. The shapefile format was introduced with ArcView GIS version 2 in the beginning of the 1990s. It is now possible to read and write shapefiles using a variety of free and non-free programs.

Shapefiles are simple because they store primitive geometrical data types of points, lines, and polygons. These primitives are of limited use without any attributes to specify what they represent. Therefore, a table of records will store properties/attributes for each primitive shape in the shapefile. Shapes (points/lines/polygons) together with data attributes can create infinitely many representations about geographical data. Representation provides the ability for powerful and accurate computations.

While the term "shapefile" is quite common, a "shapefile" is actually a set of several files. Three individual files are normally mandatory to store the core data that comprises a shapefile. There are a further eight optional files which store primarily index data to improve performance. Each individual file should conform to the MS DOS 8.3 file nameing convention (8 character file name prefix, fullstop, 3 character file name suffix such as shapefil.shp) in order to be compatible with past applications that handle shapefiles. For this same reason, all files should be located in the same folder.

Shapefiles deal with coordinates in terms of X and Y, although they are often storing longitude and latitude, respectively. While working with the X and Y terms, be sure to respect the order of the terms (longitude is stored in X, latitude in Y).

Mandatory files

.SHP - shape format; the feature geometry itself

.SHX - shape index format; a positional index of the feature geometry to allow seeking forwards and backwards quickly

.DBF - attribute format; columnar attributes for each shape, in dBase III format

Optional files

.PRJ - projection format; the coordinate system and projection information, a plain text file describing the projection using well-known text format

.SBN and .SBX - a spatial index of the features

.FBN and .FBX - a spatial index of the features for shapefiles that are read-only

.AIN and .AIH - an attribute index of the active fields in a table or a theme's attribute table

.IXS - a geocoding index for read-write shapefiles

.MXS - a geocoding index for read-write shapefiles (ODB format)

.ATX - an attribute index for the .dbf file in the form of shapefile.columnname.atx (ArcGIS 8 and later)

.SHP.XML- metadata in XML format

.CPG - file containing the single value code page to be used for ANSI to Unicode translation of attribute text in associated .DBF files.

Attributes

All attributes for all polyline, polygon, and symbol objects are automatically exported to all .SHP files. For contour maps, the Z value is exported as the "ZLEVEL" attribute for all polylines in the contour map. All attributes are automatically imported.

ClosedShapefile shape format .SHP

The main file [.SHP] contains the primary geographic reference data in the shapefile. The file consists of a single fixed length header followed by one or more variable length records. Each of the variable length records includes a record header component and a record contents component. A detailed description of the file format is given in the Esri Shapefile Technical Description.[1] This format should not be confused with the AutoCAD shape font source format, which shares the .shp extension.

The main file header is fixed at 100 bytes in length and contains 17 fields; nine 4-byte (32-bit signed integer or int32) integer fields followed by eight 8-byte (double) signed floating point fields:

Bytes

Type

Endianness

Usage

0-3

int32

big

File code (always hex value 0x0000270a)

4-23

int32

big

Unused; five uint32

24-27

int32

big

File length (in 16-bit words, including the header)

28-31

int32

little

Version

32-35

int32

little

Shape type (see reference below)

36-67

double

little

Minimum bounding rectangle (MBR) of all shapes contained within the shapefile; four doubles in the following order: min X, min Y, max X, max Y

68-83

double

little

Range of Z; two doubles in the following order: min Z, max Z

84-99

double

little

Range of M; two doubles in the following order: min M, max M

The file then contains any number of variable-length records. Each record is prefixed with a record-header of 8 bytes:

Bytes

Type

Endianness

Usage

0-3

int32

big

Record number

4-7

int32

big

Record length (in 16-bit words)

Following the record header is the actual record:

Bytes

Type

Endianness

Usage

0-3

int32

big

Shape type (see reference below)

4-

-

-

Shape content

The variable length record contents depend on the shape type. The following are the possible shape types:

Value

Shape Type

Fields

0

Null shape

None

1

Point

X, Y

3

Polyline

MBR, Number of parts, Number of points, Parts, Points

5

Polygon

MBR, Number of parts, Number of points, Parts, Points

8

MultiPoint

MBR, Number of points, Points

11

PointZ

X, Y, Z, M

13

PolylineZ

Mandatory: MBR, Number of parts, Number of points, Parts, Points, Z range, Z array

Optional: M range, M array

15

PolygonZ

Mandatory: MBR, Number of parts, Number of points, Parts, Points, Z range, Z array

Optional: M range, M array

18

MultiPointZ

Mandatory: MBR, Number of points, Points, Z range, Z array

Optional: M range, M array

21

PointM

X, Y, M

23

PolylineM

Mandatory: MBR, Number of parts, Number of points, Parts, Points

Optional: M range, M array

25

PolygonM

Mandatory: MBR, Number of parts, Number of points, Parts, Points

Optional: M range, M array

28

MultiPointM

Mandatory: MBR, Number of points, Points

Optional Fields: M range, M array

31

MultiPatch

Mandatory: MBR, Number of parts, Number of points, Parts, Part types, Points, Z range, Z array

Optional: M range, M array

In common use, shapefiles containing Point, Polyline, and Polygon are extremely popular. The "Z" types are three-dimensional. The "M" types contain a user-defined measurement which coincides with the point being referenced. Three-dimensional shapefiles are rather uncommon, and the measurement functionality has been largely superseded by more robust databases used in conjunction with the shapefile data.

ClosedShapefile shape index format (.shx)

The shapefile index contains the same 100-byte header as the [.SHP] file, followed by any number of 8-byte fixed-length records which consist of the following two fields:

Bytes

Type

Endianness

Usage

0-3

int32

big

Record offset (in 16-bit words)

4-7

int32

big

Record offset (in 16-bit words)

Using this index, it is possible to seek backwards in the shapefile by seeking backwards first in the shape index (which is possible because it uses fixed-length records), reading the record offset, and using that to seek to the correct position in the [.SHP] file. It is also possible to seek forwards an arbitrary number of records by using the same method.

ClosedShapefile attribute format .DBF

Attributes for each shape are stored in the xBase (dBase) format, which has an open specification.

ClosedShapefile attribute format .CPG

file containing the single value code page to be used for ANSI to Unicode translation of attribute text in associated .DBF files.

ClosedShapefile projection format .PRJ

The projection information contained in the [.PRJ] file is critical in order to understand the data contained in the [.SHP] file correctly. Although it is technically optional, it is most often provided, as it is not necessarily possible to guess the projection of any given points. The file is stored in well-known text (WKT) format.

Some typical information contained in the [.PRJ] file is:

  • Geographic coordinate system

  • Datum (geodesy)

  • Spheroid

  • Prime meridian

  • Map projection

  • Units used

  • Parameters necessary to use the map projection, for example:

  • Latitude of origin

  • Scale factor

  • Central meridian

  • False northing

  • False easting

  • Standard parallels

ClosedShapefile spatial index format (.sbn)

This is a binary spatial index file, which is used only by Esri software. The format is not documented, and is not implemented by other vendors. The [.SBN] file is not strictly necessary, since the [.SHP] file contains all of the information necessary to successfully parse the spatial data.

ClosedLimitations

Topology and shapefiles

Shapefiles do not have the ability to store topological information. ArcInfo coverages and Personal/File/Enterprise Geodatabases do have the ability to store feature topology.

Spatial representation

The edges of a polyline or polygon are defined using points, which can give it a jagged edge at higher resolutions. Additional points are required to give smooth shapes, which requires storing quite a lot of data compared to, for example, bézier curves, which can capture complexity using smooth curves, without using as many points. Currently, none of the shapefile types support bézier curves.

Data storage

Unlike most databases, the database format is based on older xBASE standard, incapable of storing null values in its fields. This limitation can make the storage of data in the attributes less flexible. In ArcGIS products, values that should be null are instead replaced with a 0 (without warning), which can make the data misleading. This problem is addressed in ArcGIS products by using Esri's Personal Geodatabase offerings, one of which is based on Microsoft Access.

Mixing shape types

Each shapefile can technically store a mix of different shape types, as the shape type precedes each record, but common use of the specification dictates that only shapes of a single type can be in a single file. For example, a shapefile cannot contain both Polyline and Polygon data. Thus, well (point), river (polyline) and lake (polygon) data must be kept in three separate files.

Import Options Dialog

No import options dialog is displayed.

See Also

Esri Shapefile Export Options Dialog

Esri Shapefile Export Automation Options

Esri Shapefile Import Automation Options

Export Options - Scaling

File Format Chart