Thursday, April 23, 2015

Post 5: Network Analysis

                                             Goals and Objectives

The purpose of this assignment is to perform network analysis (NA) on frac sand mines in Wisconsin. Feature class data were prepared for NA in a Python scripting exercise (Pyscripting 2 Link).
 Using the 'Model Builder (MB)' program in ArcGIS, the objectives for the exercise are as follows:
  • Load features into the Network Analysis interface
  • Calculate a route
  • Build  a model to calculate the closest facility route
  • Calculate the cost of sand truck travel on roads by county

                                                          Notes

 The following table summarizes the data that were used for the NA exercise as well as the sources associated with the datasets:
 
Table 1. Summary of datasets and sources used for network analysis.
 
 

 
  It should be noted that dollar-value used to calculate the travel of sand trucks on roads by county is a hypothetical value.
 

                                                         Methods

  Network Analyst is a set of tools and functions within ArcGIS. NA tools allow for the calculation of logistical problems, such as routing sand from sand mines to rail terminals in using the most efficient pathways. Once such pathways are determined, other ArcMap tools can be used to assess the costs associated with using such routes. The NA load locations interface (fig.1) was first used to generate a route in ArcMap before using the Model Builder function of the program. Generating a route in the familiar ArcMap program (fig. 2) allowed for a comparison of the results when MB (fig. 3) was used to perform the same function.


Figure 1. NA interface to load locations for network analysis. The interface above is for loading incidents (mine facilities). A similar interface was used to load facility (rail terminal locations).
 
Load Features onto NA Interface.   ESRI street features were uploaded into ArcMap (fig. 1), along with the rail terminal features and sand mine locations. Sand mine location data were previously adapted for the current exercise using Python scripting (see link in introduction). Adaption of the data included creating a feature class for sand mines that did not have a rail terminal within 1.5 km of them. Such a feature class was made because it is unlikely that sand will need to be hauled across roads if the facility it originates from  has its own rail terminal.
 
Calculate Route. When all the data were loaded into the NA interface properly a route was generated between the sand mines and rail terminals (fig. 2). The closest facility route was generated by selecting the solve button on the interface.



Figure 2. Closest routes generated between facilities (rail terminals) and incidents (sand mine facilities) using the NA tool in the ArcMap interface. 


Build a Model. Model builder was used to determine the closest route between rail terminal facilities and sand mine incidents (fig. 3). Model Builder was also used to create a layer from the routes using 'select data' and 'copy features' tools. Once a route was generated, a batch projection tool was used to project the mine, terminal, and rote datasets into the NAD_1983_HARN_Transverse_Mercator coordinate system so that it would match the coordinate system used for Wisconsin county data (fig. 3).

 
 

Figure 3. Model of the tools (rectangles), input feature classes (blue and green ellipses) and outputs datsets (blue ellipses) used for network analysis used to find the closest routs between sand mines and the nearest rail terminals.
 

Calculate Cost. In order to calculate the annual cost of sand truck traffic by county, the road length  needed to be determined for each county (in miles). The tabulate intersection tool was used to determine the length of sand truck routes through each county (Brost, 2014). The tabulate intersection tool was useful because the parameters of the tool automatically converted the lengths of roads in each county into miles. MB was used to generate the table that contained the distance calculated by the tabulate intersection tool (fig. 4). Additionally, the "County_FIP" field was preserved when the new table was generated so that it could be joined to the county feature class in ArcMap for data comparison (table 2).



Table 2. The table generated by the Tabulate Intersection tool was joined with the attribute table for Wisconsin county boundaries via the County_FIP field.
 

  Once the length of road that ran through each county was determined, an annual cost per county was generated based on a dollar-amount per mile. The hypothetical dollar-amount used to generate the cost was $0.022/mile. The assumption is that sand each mine facility will require haul 50 truck loads per year. Calculations also accounted for the fact that each truck would have return to the sand mine facility after bringing material to the rail depot (eqn. 1). MB was used to calculate the annual calculate the cost of road use per county by adding a field to the table generated by the tabulate intersection tool and applying the field calculator to the new field.
 
 
 
 
 
Figure 4. Model used to determine the miles of road found in each county and to apply the equation to determine the annual cost incurred per county due to the transportation of frac sand.
 
Annual Road Use Cost Per County = (0.022) x 2 x 50                                                     Equation 1.
 

                                                       Results

The routes between sand mine locations and rail terminals is shown in figure 5. Some of the original routes entered Minnesota from the east due to the fact that the closest rail depots to Wisconsin sand mines were in that state. However, such routes through Minnesota were clipped as the project is meant to determine costs incurred by Wisconsin counties for the transportation across their roads.
 
  The tabulate intersection tool used to determine the mileage of routes through each county seemed to be fairly accurate. For example, the 'measure' tool was used to crudely estimate (at 1: 1,000,000 scale)  the distance of routes in four counties and compared to the results generated by the tabulate intersection tool (table 3). While three of the counties, Clark, Outagamie, and Dunn had distances fairly close to the distances generated by the Tabulate Intersection tool, the estimate (measure tool) for Eau Claire county was off by a considerable amount (~50%; Table 3).
 
 
Figure 5. Map showing routes between sand mine facilities and rail terminal depots. The route feature class was generated using MB.

 
 

Table 3. Comparison of lengths generated by the Tabulate Intersection tool and estimated using the Measure line tool on the ArcMap interface.


  The county with the highest cost associated with frac sand trucking is Chippewa at around $690/year (tables 3 and 4; figs. 6 and 7). Chippewa county's annual cost for frac sand transportation is more than double second and third highest, which are Monroe and Wood Counties at $331 and $329, respectively.

The counties with the least amount of frac sand through traffic were Winnebago and Burnett counties at $2 and $3, respectively (tables 3 and 4; figs. 6 and 7).

  Five counties incurred costs due to frac sand transportation through across their borders, although neither has a mine nor a rail depot (figs. 6and 7). The counties were Eau Claire, Vernon, and La Crosse, Dane, Dodge, and Winnebago at $103, $73, $237, $14, $62, and $2,  respectively. Of course, as mentioned previously, mileage across Eau Claire County appears to be erroneous.

Six counties would route their sand through Minnesota they choose to do so via the shortest route distance-wise (figs. 6 and 7).  Four counties, Burnett, St. Croix, Pierce, and Pepin would route their sand exclusively through Minnesota, while the remaining counties, Trempealeau and Buffalo have routes through both Minnesota and Wisconsin.




Table 4. Attribute data from the joined tables (table 3) were cleaned up and organized into Microsoft Excel.
 

Figure 6. Graph illustrating the annual cost to transport sand across the county based on mileages generated for each with the tabulate intersection tool.
 

Figure 7. Map showing annual estimated costs incurred by counties affected by sand mine traffic to and from rail depots.
 

                                                          Conclusions

 Network Analysis and Model Builder generally provided good tools for calculating the closest routes between sand mines and rail terminals. However, although a comparison of the distances generated by the Tabulate Intersection tool and the measure tool in the ArcMap interface were close to one anther in general (table 3), Eau Claire county's distances were off from one another by about 50%. There is no real good explanation for the gross error between the two distances generated for Eau Claire county, especially when considering the fact that Eau Claire has only one route (segment) transecting it while others with two (~4.6%) or even six segments (~2.5%) were closer in terms of value.
 
  Due to such disagreement in the distance values generated by the Tabulate Intersection tool and those estimated manually with measure tool, accuracy of the rest of the distance data from untested counties is questionable as well. In the future it might be interesting to generate route distances by county using two (or more) methods and then perform a more in-depth analysis on the error generated between the methods used.
 

                                                Works Cited

 
 Brost, S., 2014, Network analysis, Fall 2014 Geog 337:
         http://fracsandgeog337.blogspot.com/2014/11/network-analysis.html (accessed April, 2015).                 
 
 
                                                          
 


Friday, April 10, 2015

Post 4: Geocoding

                                                          Goals and Objectives

  The goal of this exercise is to geocode Wisconsin 18 sand mine locations so that they can be used in the next exercise, which is routing. Spatial and descriptive data for the mines were presented to the class in the form of an un-normalized Microsoft Excel file, as would be expected if the  Wisconsin-DNR were to provide such data (fig. 1).  Each student was assigned 18 mines from the excel file that they would need to normalize before geocoding.
 
  Data from the Excel file was geocoded using  an ESRI geocoding service. Mines located with the geocoder were compared with real-world mine locations to ensure that addresses produce by the geocoder were as accurate as possible; if an erroneous match was produced by the geocoder other data, e.g. Public Land System Survey (PLSS) and Google Earth, would need to be used to locate the mine (figure 2).

  When all of the data was geocoded,  it was merged as a class producing results for 148 mines. Each mine had a Unique Identification field that was used to compare the locations of geocoded mines (as a class) to the actual location of the mines in order to determine our accuracy.

 

                                                       Methods  

Normalizing Wisconsin DNR Data
  In order to use the ESRI geocoder, sand mine data provided by the Wisconsin DNR needed to be normalized. Columns in the excel spreadsheet needed to sorted so that each column contained one set of information. For example, in Figure 1 the 'Address' column for many of the mines contained several pieces of data such as street address, zip code, town/city, and PLSS data. In addition, to each column having its own set of information to ensure that the ESRI geocoding tool worked as efficiently as possible, a PLSS field was created in the normalized table as well. PLSS information will be a useful field when the geocoder does not locate and one has to locate the mine manually, as will be discussed in the 'Geocoding Mines' section of the report.  

Geocoding Mines
  The normalized table was added to ArcMap and loaded into the ESRI geocoding program. Sand mines were geocoded based on their addresses and the geocoder matched 15 out of the 18 mines, with 2 unmatched mines and one tie (fig. 1). A geocoding shapefile was also produced that contained the geocoded locations (fig. 2).
  In order examine the 3 mines that were not matched with geocoder, and to check the accuracy of the 15 matched locations, rematch (figs. 1) was selected. The ESRI 'World Imagery' basemap was used (fig. 1) so that the points in the 'geocoding results' could be compared to locations of mines on the basemap (where they existed) using PLSS data. For example, PLSS information was overlain onto the basemap. When the property was located down to the quarter-section (fig. 3), the address was manually picked.
  Due to outdated imagery on the ESRI basemap, Google Earth was also used to manually locate geocoded mines for accuracy. If the location of a mine that was automatically geocoded was found to be inaccurate, the 'Interactive Rematch' interface was used to manually select a location based on basemap and Google Earth verification (fig 4).

 
 
Figure 1. Initial results of running the ESRI geocoder.
 
 
 
 
Figure 2. Point shapefile of sand-mine locations automatically located with the ESRI geocoder with ESRI basemap imagery to aid in manual location.
 
Figure 3. PLSS data for unmatched (or poorly matched) mines were verified using PLSS data from WDNR data and ArcGIS PLSS data down to the quarter-section (red squares).
 
 Figure 4. Interactive Rematch interface used to manually locate mines based on PLSS and imagery (ESRI basemap and Google Earth).
 
Geocoded Mine Accuracy
 Merge. Geocoded mines from each student were merged with the ArcMap Data Management 'Merge' tool. In order to run the tool, tables from each student had to be consistent with one another. For example, one student renamed their 'Mine_Unique_ID' field to 'Mine_ID'. In order to run the merge tool with all the appropriate fields, all 18 of the student's unique mine IDs were re-entered into a new field named 'Mine_Unique_ID'.
 
 Distance. Once the class' mine data were merged into one shapefile, the 'Point Distance' tool was used to create a table (fig. 5). Input features were the merged class mine data and near features were the "all_mines" shapefile, which contained the actual, accurate locations of the mines; accurate mine data were provided after each student geocoded their respective mines. Before the distance tool was used, both the all_mines and merged class data shapeifles were projected into a state coordinate system. For the projection, NAD_1983_2011_Wisconsin_TM (meters) was used.  A radius of 1000 meters was used in the point distance tool and a table of distance (distance table), which excluded 56 mines from the table.
 
Join. The merged class data were spatially joined with the distance table. In order to do the join, a new field was created in the class mine data shapefile. The new field was Input_FID and corresponded to a field generated when the distance table was generated. The Input_FID corresponded to the Object_ID of the input shapefile (i.e. the class mine data).
  Once the new field (Input_FID) was created in the merged sand mine shapefile, its respective attribute table was joined with the distance table generated by the point distance tool; the joined table displayed the unique mine IDs and as well as the distance between locations of actual mines and those that the class geocoded that were less than 1000 meters from one another.
 
 

                                                           Results

 
Normalization of Wisconsin DNR Data
  Sand mine data from the WiDNR were inappropriate for the purpose of geocoding (fig. 5). For example, the address column contained not only the street number, but in many cases the zip codes, city, state, and PLSS information (fig. 5; highlighted portion).
  Such information was separated into individual columns so that the ESRI geocoding program could be run in ArcMap; the result was a normalized table (fig. 6).
 
 
 
Figure 5. Un-normalized table of sand mine data that was provided by the WiDNR.
 
 
Figure 6. Normalized table of sand mine data with a separate column for each piece of address information: street address, city/town/village, zip, county, state, PLSS.
 
 
Geocoded Mines
 
  18 sand mines were geocoded using the ESRI geocoding program. All 18 of the mines were located manually because none of the mines were geocoded correctly by the program. Locations were picked using ESRI basemap imagery, PLSS shapfiles (in conjunction with PLSS data provided by the WiDNR), and Google Earth. The geocoded shapefile was projected into a Wisconsin state coordinate system: NAD 1983 (2011) Wisconsin TM (METERS); the data is displayed in figure 7.
 
Once each individual in the class geocoded their 18 mines, the instructor provided a new shapefile (All Mines) that showed the sand mines' actual locations. A spatial comparison of the 18 mines that were geocoded by the author of the report (Luczak Mines) is compared to the accurately placed All Mines (fig. 8).
 
  Once the current GIS class' geocoded mines were merged and projected into an appropriate coordinate system, the distance between the class mines (Class Mines shapefile) and the actual locations of the  mines (All Mines) were calculated (fig. 9). To calculate distance, a radius of 1000 meters was used. One reason for selecting such a radius was economy of data, for example, if no radius were selected every location in the Class Mines shapfile was compared with every point in the All Mines shapefile; the result was over 500 comparisons. Also, any distance greater than 1 km between the geocoded mines and the actual mines was likely the result of gross error and ultimately worthless. A total of 56 of the Class Mines were located  at a distance greater than 1km from the actual mines. A total of 10 of my geocoded mines were in excess of 1km from the locations of the actual mines and are given a value of "<null>" (fig. 9).
 
 
 
Figure 7. Map of the 18 mines that were initially geocoded for the project.
 
 
Figure 8. Map showing the 18 mines that were geocoded for this project (Luczak Mines) versus the actual mine locations (All Mines).
 
Figure 9. The distance between the 18 mines that were geocoded for this project ('Distance' field) and the actual mines.
 
 

                                                        Discussion

  Error is common in geographic data due to numerous factors such as data quality, operational commands, and data collection methods. Errors found in geographic information is grouped into two categories which reflect the characteristics of the errors (Lo and Yeung, 2007). The two categories of error are operational and inherent and they are summarized in the table in figure 10. Both operational and inherent errors can be gross: mistakes/blunders, systematic: mechanical defects in collection tools/changing environmental conditions, or random: errors that are left after gross and systematic errors are accounted for(Lo and Yeung, 2007).
 
  Inherent errors result from the fact all maps are merely scale representations of the Earth, which is far too complex to be modeled on a 1:1 scale. In contrast, operational errors result from the collection and management of geospatial data. Both operational and inherent errors were encountered in the previous exercise and will be discussed.
 
  The geographic data provided by the WiDNR is likely more accurate than the data geocoded by the class. The reason that the WiDNR data is likely more accurate is that they verified sand mine locations by going out into the field and collecting GPS points at such locations. Using GPS to verify sand mine locations likely contains inherent and operational errors like field survey measurements due to instrument limitations, and minor operational errors such as sampling procedures (fig. 10). However, assuming the WiDNR has set up standard operating procedures (SOPs) for collecting GPS data in the field, gross operational errors in data collection are probably lower than those of the class' when the mine locations were geocoded based off address data.
 
  Class Mine error was probably due to gross operational errors, especially when locating data for manually geocoding mines. For example, the geocoded mine that was closest to the All Mines location was placed 36 meters away (Unique ID#106). Furthermore, roughly 38% of the mines geocoded by the class (56 out of 148) were calculated to be greater than 1 Km from the mine locations determined by GPS information. While 1 Km may not seem like too far to be off in terms of accuracy, one could imagine if the mine locations were actually houses. If emergency services were dispatched to locations 1 Km away from where they needed to be, then people's lives could be at risk.
 
  Other types of error, both inherent and operational, are likely present in the geocoded data as well, however, they are likely overshadowed by the inconsistencies caused by gross operational error that resulted  when mines were geocoded. Other types of error that were likely overshadowed by the gross operational error in Class Mine data includes, but is not limited to, inherent and operational error from projecting the shapefiles into NAD_1983 format and numerical rounding
during computation (fig. 10).
 
  Of course, all error was determined based on the belief that GPS data provided by the WiDNR the most accurate dataset. If the WiDNR data were inaccurate due to gross operational errors such as data collection, then all error calculations between their data and the class' data could be false. However, the likelihood that the DNR's field data is less accurate than the data geocoded by the class is not very good.  
 
 
 
 
Figure 10. Screen-shot of a table  summarizing error source and type. From Lo (2007).
 

                                                Conclusions

  The purpose of the geocoding exercise was to stress the importance of data integrity and standardization. Standardization of data in all stages of the exercise would have likely resulted in more accurately geocoded  data at the end of the exercise. For example, each student probably normalized their tables differently from one another, thus it is hard to tell who is and who is not accurate.
 
  The exercise also established the fact that all geospatial data that one receives should be questioned for integrity. For example, it was only after realizing hodge-podge nature of assessing the class's geocoded data (my own included) that I began to question the accuracy of the WiDNR's data: How accurate is their geospatial data regarding sand mine locations? Did the DNR establish standard operating procedures (SOPs) for the collection of GPS points? If such SOPs were established by the DNR, were they followed by all technicians collecting the data on the mines?  Even if all GPS data collected by the DNR was collected consistently and according to established SOPs, did they assess the data for error? If so, how did they make such assessments?
 

                                                       Work Cited

 
 
Lo, C.P., and Yeung, A.K.W., 2007, Concepts and Techniques of Geographic Information
 
              Systems: Upper Saddle, New Jersey, Prentice Hall, 544p.