Evolution of a Python Script into a Script Tool: Part 1

The script I’ll be using in this 3-part series of posts is the latest iteration of the script in my last post. It goes back a few posts, and some may find it informative to trace it back to my effective, if not so user-friendly, initial version which required me to export a text file and so on. Some may find it informative; however, others may find it tedious. So let’s pretend you don’t feel like going back through those, and let’s start fresh. I will issue a little disclaimer–the script you see below did not emerge fully formed, but went through several steps before getting here, including bouncing back and forth through the tool validation and attempts to run the tool in ArcMap and running chunks of it in IDLE…

So here’s the script I said you’d see below, but I think maybe I should go back further. All the way back. I’ll do so after this little image:

If you’ve read the earlier posts, skip to the next paragraph. For those who haven’t, here’s the abstract: Where I work, we deal with imagery that is often tiled into a grid of tiffs, sids, or ecws. We have (or we create) an index shapefile that shows where each tile is in the grid. Sometimes I have wanted to rename the tiles for a variety of reasons. I figured that just as a field in the attribute table contains the current name of each tile, I could add a field and populate the names I want them to have. Then, using a Python script, I could read the values in these two fields for each record, and rename the files in a given folder from the existing name to the desired one.

So, the bare bones version of this script is in my last post. Basically, it takes four parameters: the index shapefile, the field name corresponding to the current file names, the field corresponding to the desired one, and the folder containing the files. Then the script creates a search cursor, and then goes through the records getting the values from each of the designated fields, and renames the files from the current name to the desired one.

Easy peasy, right? It does require the fields to have exactly what you want the names to be. Now, sometimes there’s a file extension present in the attribute table, sometimes not, so I would either have to customize the script by tacking on ‘+”.tif”‘ where the old and new names are, or simply add them into the attribute table with the Field Calculator. Not much trouble, but I thought if I make it a script tool, I could maybe give some options for dealing with those file extensions, and also with prefixes like “2011_UTM_LosAngeles_”, or like “M:\11_UTM_LA_tiffs\”, that sometimes precede the actual simple name of the tile. So I added a couple fields to my test tile index so I could play around with this a bit:

Now, before I really start fleshing out the bare bones, I want to make sure the bare bones work as a script tool, right? So I right-click in the ArcToolbox window, add my custom toolbox, and then right-click the toolbox and add a script. I give it a name, a label, and a short description, and I like to check “relative paths”. I hit “Next”, and navigate to my script. I hit “Next”, and then here’s where it can start to get a little complicated, depending on the script and the parameters. Keeping the script open in IDLE for reference, I fill in the parameters. I give them the following Display Names: “Index Shapefile”, “Field Containing Old Name”, “Field Containing New Name”, and “Workspace”. I set the data type for the first of these to “Feature Class”. For the next two, I choose “Field”, and for the last I choose “Workspace”. So far, so good. I highlight “Field Containing Old Name”, and in the lower Parameter Properties window, I scroll down to “Obtained from” and choose “Index Shapefile”. Furthermore, since I want to use text strings, I click next to “Filter”, choose “Field”, and then uncheck all but the “Text” box. I repeat this for “Field Containing New Name”, hit “Finish”, and now the script tool appears in my toolbox.

When I double-click, it opens up a dialog box just like any other ArcMap tool. I fill in the fields:

Hit go, and after a few seconds I can see in the ArcCatalog window that my files have been renamed! Yay!

In Part 2, I’ll define a function to modify the names, test it in IDLE, incorporate it into the script, add the necessary parameters, and add a simple little line to the tool validation tab.

Python script to rename image tiles (geotiffs, sids, ecws…) using the index shapefile

Elsewhere I posted a script that I wrote to allow me to do this same task, to rename imagery tiles based on attributes in the index shapefile. It wasn’t perfect, and it had a couple extra steps, but I wrote it pretty quickly and it worked well enough. If you want to, you can read more about it on that other post, or the example application linked from there.

Earlier today, however, I greatly simplified this script so that I can skip the step of exporting the attribute table to a text file and then go through all that parsing. Instead, this uses a search cursor. I wrote this one at home on a machine with ArcGIS 10. This current version basically runs from IDLE, but I’m going to adapt it for a script tool, which is why I have “arcpy.GetParameterAsText()” in there for the input parameters. To run it from IDLE, I just delete those and insert the shapefile, fields, and workspace into the code. After I get it running neatly from within Arc 10, I’ll probably make a 9.3 compatible version to run at work. Here it is:

Python script to rename files based on a shapefile attribute — example application

I thought it might be helpful to post not only scripts that I’ve created, but why and how I use them. As written, these scripts tend to be fairly short and simple, but also very specific to some task I do repeatedly at work, or that I suspect I may have to redo multiple times with slight changes, and which would be tedious to do manually. In short, I thought an example might provide a helpful context for the scripts themselves.

This example uses the script from the post titled “Python script to rename files based on a shapefile attribute”.

At work, I deal with aerial photography. This imagery may come to us in tiles, a grid of geotiffs, for example. Sometimes customers desire that we deliver the imagery to them cut into a different tiling scheme in order to keep consistency with their imagery from previous years, or with some other data layers or boundaries. Sometimes, when compressing an area to .ecw or .sid we may need to break up an area into some arbitrary tiling scheme to balance file size with image quality. In those cases, I find the script from my post “Python script to rename files based on a shapefile attribute” to be useful. I could also see this as possibly be useful with, say, a NED dataset from the USGS with a metadata shapefile.

Perhaps in another post, I’ll go into more detail about how to create a tile index for the imagery — for now I’ll just say in brief that it can be accomplished in a few ways — it can be done using ER Mapper (create a mosaic using the Mosaic Wizard, then create a Mosaic Tile Index Grid using the mosaic) , Global Mapper (select layers in the Control Center, right click, and choose Create Area Features from Selected Layer Bounds), and I think with a little creativity there is some way to do it ArcMap 10 as well.

Anyway…here are some steps and screenshots of how I might use this script:

First, let’s start with a raster dataset–I’ll use imagery –and an index shapefile of some sort.

The attribute table currently has a field called “TIFF” that contains the number of the tiff that fills the extent of the corresponding feature. Now, I want to replace this incomplete, sequential, bottom left to top right numbering scheme with one that is more convenient for our customers using rows and columns. I begin by adding three fields: “TILE”, “Row”, and “Column”.

I use the Select Features tool to select all features in a row, then use the Field Calculator to populate the “Row” field with “A” for the selected features. Rinse and repeat until all records have a value for row and column. Manually selecting rows and columns can be a little unwieldy for a large number of rows and/or columns — I may try to write something to automate this step eventually, but so far I haven’t found it necessary.

Once I have the rows and columns done, I use the Field Calculator to fill the “TILE” column. In this example I simply use the row and column separated by an underscore.

Now, if I change the label field to “TILE” instead of “TIFF” for the index shapefile, the new naming scheme is displayed.

Now I just have to get the names of the tiffs to match! First, from the Attribute Table for the index, I export to a text file. The way I have the script set up, this table has to be in the same folder as the tiles to be renamed.

Now I open the script in IDLE, and make sure the path, text file, and the fields corresponding to the old name and the desired new name are set.

Run the script, and the files are renamed to match the “TILE” field in the index shapefile.

At this point I’d probably load the newly renamed tiffs into ArcMap along with the index and make sure everything is good, then delete the “TIFF” field from the shapefile and LookupTable.txt from the folder to avoid confusing the customer. And, we’re done!

Some hints and resources

I realize I have not been terribly thorough in giving instructions with the scripts I have posted so far. They have been short and relatively simple, but in order to use them, I think you would have to be familiar with the process of running a Python script from IDLE and/or creating a script tool in ArcMap. I am using ArcGIS 10 at home, but at the office I have ArcGIS 9.3.1 installed, so the scripts and the methods by which I run them are different depending if I am at home or at work. I will try to remember to be specific about which I’m using, and give more details about how to run these scripts or import them to script tools and so forth in the future.

In the meantime, though, here are a few hints–

(1) If you see “import arcpy” near the top, I am using a computer with ArcGIS 10 and Python 2.6. If you see “import arcgisscripting” I am using a computer with ArcGIS 9.3.1 and Python 2.5.

(2) If you see lots of variables hard-coded, meaning things like “elevInput = ‘C:\\GIS\\TEST_FOLDER\\elev_10m'” then I have written the script to run from IDLE. If you see “elevInput = arcpy.GetParameterAsText(0)”, then I have written it to be used as a script tool.

(3) If you are saying, “huh?” or don’t know about scripting but want to learn, then I recommend checking out the following resources:

First, get familiar with ESRI’s online help if you aren’t already. For ArcGIS 10, go to http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html and get familiar with the Geoprocessing Tool Reference (Professional Library > Geoprocessing > Geoprocessing Tool Reference). Also check out the bit on Creating Script Tools (and off of that, Customizing Script Tool Behavior) under Geoprocessing. For 9.3.1 Geoprocessing and Geoprocessing Tool Reference are both on the root at http://webhelp.esri.com/arcgisdesktop/9.3/index.cfm?TopicName=welcome. These are extremely helpful as they provide examples and details about the parameters.

Okay, now Python-wise, I recommend Mark Lutz’s book Learning Python (http://www.amazon.com/Learning-Python-Powerful-Object-Oriented-Programming/dp/0596158068/ref=ntt_at_ep_dpt_2). Personally I bought the 3rd Edition because it was cheaper and covers up through 2.6, which is what Arc 10 uses.

Also, through Penn State you can access quite a lot of stuff from an online course for free at https://www.e-education.psu.edu/geog485/. There are other resources worth a look at http://open.ems.psu.edu/courseware, while you’re at it.

And you might as well read through some of Dive Into Python: http://www.diveintopython.net/toc/index.html, and bookmark docs.python.org. Okay, now don’t get overwhelmed– I think it was Lao Tzu who said what has generally been translated to “A journey of a thousand miles begins with a single step.” The journey may seem overwhelming, but you can surely handle a single step. And after that, take another when you are ready. And another. And we’re underway!

Python script tool to export each feature in a feature class to its own new feature class

Here is the script:

Selects each feature in a feature class and exports each to a new shapefile.
Created by John Speargas, 29 Jan 2012

# import modules
import arcpy
from arcpy import analysis, env

# input parameters
in_fc = arcpy.GetParameterAsText(0)
field = arcpy.GetParameterAsText(1)
env.workspace = arcpy.GetParameterAsText(2)

# create search cursor
rows = arcpy.SearchCursor(in_fc)

for row in rows:
where_clause = field + “='” + row.getValue(field) + “‘”
# name the output features based on the value found in the field selected above
analysis.Select(in_fc, row.getValue(field), where_clause)

This script is set up to be added to a toolbox in ArcGIS 10. I set up the script tool parameters as shown in the dialog box below. When you select the second parameter, the field based upon which the new features should be named, in the lower part of the dialog box click next to “Obtained from” and select the input feature class parameter.

MapWindow Measuring Tool and curious NAD 83 State Plane Coordinate System inconsistencies in .prj files

I have been using MapWindow recently, and it seems like a very good program, especially considering it’s free. As my employer no longer provides a viewer to our customers, we have been recommending MapWindow…but…there is a problem. Many of our customers use the California State Plane Coordinate System with units of U.S. feet and, for whatever reason, the measuring tool in the current version of MapWindow (4.8.6) doesn’t seem to be able to deal with feet. I started noodling around with shapefile projections in ArcMap, Global Mapper, and MapWindow and checking out the .prj files in Notepad to see if I could find any clues.

So, I created a shapefile and made a copy. I defined the projection of one in ArcMap, and of the other in MapWindow. Arc was able to display them both correctly, but did not recognize that the projection is actually the same. When I pulled both into MapWindow, the ESRI one failed to reproject and was not displayed. When I opened both in Global Mapper, no problem. So then I exported a new one from Global Mapper, and tried opening all three simultaneously in MapWindow. MapWindow displayed the Global Mapper and MapWindow ones, but again the ESRI one failed. All three open fine simultaneously in both ArcMap and Global Mapper, but the addition of the MapWindow defined layer triggers a warning that it has a different geographic coordinate system than the other two.

A peek at the projection information illuminates that there is some inconsistency in how even the datum is described:

From MapWindow:

NAD83 / California zone 5 (ftUS)
Projection: Lambert_Conformal_Conic
false_easting: 6561666.667000
false_northing: 1640416.667000
central_meridian: -118.000000
standard_parallel_1: 35.466667
standard_parallel_2: 34.033333
latitude_of_origin: 33.500000

Linear Unit: US survey foot
NAD83Datum: North_American_Datum_1983

From ArcGIS:

Projection: Lambert_Conformal_Conic
False_Easting: 6561666.666667
False_Northing: 1640416.666667
Central_Meridian: -118.000000
Standard_Parallel_1: 34.033333
Standard_Parallel_2: 35.466667
Latitude_Of_Origin: 33.500000

Linear Unit: Foot_US
GCS_North_American_1983Datum: D_North_American_1983

From Global Mapper:

Projection: Lambert_Conformal_Conic
false_easting: 6561666.666667
false_northing: 1640416.666667
central_meridian: -118.000000
standard_parallel_1: 34.033333
standard_parallel_2: 35.466667
scale_factor: 1.000000
latitude_of_origin: 33.500000
Linear Unit: Foot_US

GCS_Geographic Coordinate System

Shouldn’t this kind of thing be standardized?

Python script to rename files based on a shapefile attribute

The script is below. In another post, I show an example of how I use this script. It meets a specific need I encounter periodically at work. The application is pretty specific, but I think it is an example of how a simple script can augment GIS skills, so look for that post coming soon. Please note that if you copy this script from this page and paste it into IDLE you will probably have to go through and replace the quote marks (both single and double), as well as the two minus signs, as they seem to copy over incorrectly. This can be accomplished quickly by using “Edit > Replace”, copying the incorrect quotation mark into the “Find” field, and entering the correct one in the “Replace” field. Repeat for each incorrect character.

Okay, here it is:

Renames files in a target folder using a lookup table exported as a text
file from the attribute table of an ESRI shapefile.  The lookup table should
be in the same folder as the target files, or the code changed accordingly.
John Speargas, 2011

# Import module
import os

# Input folder containing files to be renamed
folder = “C:\\GIS\\TEST\\directory_to_be_renamed\\”

# Input lookup table
lookupTable = open(folder + “LookupTable.txt”)

# Read lookup table and close file
tableList = lookupTable.readlines()

# Input fields corresponding to the old and new names
old = “OLD”
new = “NEW”

# Read the lines in the table and strip away unnecessary characters
newTable = []
for line in tableList:
    newLine = line.strip()   
    newerLine = newLine.strip(“,”)  
    newestLine = newerLine.strip(“‘”)   
    almostFinalLine = newestLine.replace(‘”‘,””)  
    finalLine = almostFinalLine.split(“,”)

# Get position of old and new field names
header = newTable[0]
indexOld = header.index(old) – 1
indexNew = header.index(new) – 1

# Make lists corresponding the new and old values
oldList = []
newList = []
for item in newTable:
    if item != newTable[0]:
        valueOld = item[indexOld]
        valueNew = item[indexNew]
# For each file in the folder, parse the old name from (the path
# and-?) the extension
for file in os.listdir(folder):
    nameAndExt = os.path.splitext(file)
    oldName = nameAndExt[0]
    extension = nameAndExt[1]

#   Index the old name in the old list if it is present
    if oldName in oldList:
        indexOldName = oldList.index(oldName)

#       Locate the same index in the “new” list
        newName = newList[indexOldName]
#       Rename the file using the path, the new name, and the extension
        os.rename(folder + oldName + extension, folder + newName + extension)

Cell counts and flow direction, part 2

Let’s see…where did I leave off?

After posting a note on facebook hypothesizing that perhaps the graph shown in the previous post showed that the flow direction algorithm was biased toward the ordinal directions.  Brian agreed that there was an artifact here, perhaps as a result of the algorithm, or possibly of the gridded nature of the elevation raster itself.  Brian suggested a couple of tests, involving conversion from raster to point and reinterpolating the raster, converting to contours and then using topo to raster, or smoothing using focal statistics.  I did not try any of these, because I did not want to open the interpolation can of worms here, per se, because I thought it might essentially introduce another variable and muddy the source(s) of bias further.  I had reprojected the data from NAD83 to State Plane, because otherwise slope never comes out right, and this projection introduced an artifact mesh.  After a little back and forth on this, Greg chimed in that DEMs are highly inaccurate to begin with as a result of data collection.  , and that conversion back and forth only propagates the error.

I realized that the sine wave implies a D-infinity flow algorithms–that is, flow could be calculated in any direction from the cell center.  This is, of course, not what ArcGIS actually does.  What the graph should really show is 8 columns to represent the 8 discrete directional possibilities from one cell to any of its eight neighbors.  What occurred to me at this point is that for any of these 8 possible directions, the cell count will not be the same at a given distance.  If, for example, one measures the distance from the center of a cell to the center of each neighboring cell in the ordinal directions (diagonally), then the distance will be the cell width multiplied by the square root of two, and we can call the cell count 1.  Assuming that cells can be partially counted, then the same distance in the cardinal directions will cover about 1.414 cells.  Therefore, the pattern I noted may be nothing more than the geometry of a square, causing cell counts to be relatively lower in the ordinal directions for exactly the same flow length.  I thought a way to test this might be to clip a circular extent and rotate it.  It should be possible to remove the effect of this geometry by using a factor of the square root of two, and then to see whether the same number of cells is calculated for each angle of rotation (corresponding to the 8 directions in question) when the angle is accounted for.  I made a graph that shows how I believe the pattern should change for an identical area as it is rotated through a full rotation at 15-degree increments.


Also, I made a couple short animations in Windows Live Movie Maker showing the flow directions as the circular clip from the DEM is rotated.  These aren’t so much for analytical purposes as for fun.  They were more fun with music, but I removed it to make the files smaller:




Cell counts and flow direction

Recently, as I was playing around with the DEM for what I will undoubtedly inadvertently refer to as “my watershed”, as if I own it. The watershed in question is a small one…I don’t know the area off the top of my head but on my other screen here I have ArcMap open, so…looks like 707392.5 square meters, according to the Shape_Area field of the watershed polygon I converted from the basin computed from the flow direction based on the filled 1/3″ NED. Bit of false precision there, I’d guess…for convenience and because really what I wanted to express is the order of magnitude, let’s just call it 707000 sq m, which I guess is what… 0.7 square km, as a ballpark figure. It’s at the upper part of the Laguna Canyon watershed, in the sort of northeastern San Joaquin Hills, conceptually defined as the area draining to Barbara’s Lake.
I was looking at the DEM for a squarish area covering an area several times larger than the watershed. Again, I have Arc open, so…about 800 cells on a side, so about 8 km on an edge, 64 sq km overall. And I was looking at the flow direction raster for the area of the watershed and thinking that because the bulk of drainage, that is, the accumulated overland flow, should go southwest toward Laguna Canyon and out to the Pacific, to the adjacent parallel canyon and in the same direction, or to the northeast corner of the map where it goes out onto the plain and turns south, ultimately to the Pacific as well. Clipped in tight, it should all go southwest. But how does the flow accumulation, I wondered, compare with the flow direction. So I opened up the attribute tables and took a look at the count of each cell classification in the flow direction raster. I won’t go deep into the various tools I mention here…rather, I would refer you to ESRI’s online desktop help page http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//009z00000052000000.htm, where you can find more detail. I copied the numbers over to Excel and made first something that looked kind of like a scree plot, for those who know what that is. The labels below the horizontal axis were ordered highest count to lowest. To my initial surprise, the majority of cells were classified east, followed by west. The other directions all had much lower values, but north and south were next. I could see most of the flow being east and west, perpendicular to the ridges and the channels–that makes perfect sense, but why then north and south, rather than northeast, southeast, northwest, southwest? I sorted in numerical order so that the axis began with due east and went around in a circle clockwise. With a smooth line fit through the numbers, it looks a bit like a sin wave with some kind of amplitude boost every other upward peak. Really, because ArcGIS calculates flow to the 8 neighboring cells, a more appropriate graph would reflect the discrete nature of the possible values, but I include the graph here uncorrected in this regard.
I posted a note on facebook showing this graph, and pondering whether this pattern, each peak corresponding to one of the cardinal directions (E, S, W, N) and each trough one of the ordinal (SE, SW, NW, NE), was an artifact of the gridded data, perhaps bias introduced by something in the flow direction algorithm. I received some feedback from colleagues at Long Beach, Brian Nagy and Greg Ziolkowski, stimulating and combining with some of my own evolving thoughts on the matter, which I well get into next blog.