Bi-dimensional Regression Revisited more

draft...

Bi-Dimensional Regression Revisited: Notes toward a Characterization of Historical Accuracy and a Theoretical Foundation for Analytic Historical Cartometry John Hessler Geography and Map Division, Library of Congress Abstract From what rests on the surface one is led into the depths1. The question of the “accuracy” of an historical map has always been a problematic concept for historians of cartography. Much practical discussion in the scholarly literature has focused around the meaning and definition of accuracy and whether or not it is an absolute or relative concept. Is there any historical import to be gained by describing an historical map in terms of modern notions of accuracy or is the concept more useful and more precisely defined in terms of relative accuracy? Philosophically, this question also has both an epistemological and ontological component. From an epistemological perspective the historian wonders whether one can answer, in either relative or absolute terms, if one map is more accurate than another. How does one go about making an assessment of the rather vague concept of “accuracy” and if this is to be a mathematical concept, does the resulting statistical number have any semantic meaning or historical causality? From the ontological perspective the question is even more complex giving rise to categorical considerations about the structure and makeup of the visual representation of geographical objects in historical time2. What is an historical map visualizing and can we make the a priori assumption that it’s creator made it to be “accurate”, even if we have an accepted definition of the concept? In a recent article on the progress of the methodology of the history of cartography, the eminent geographer and writer Mark Monmonier3, called the analytical methods utilized by the author of this paper for the statistical analysis of historical maps4, “structural and geometric mining”, and wondered how far these statistical and geometric methods could be pushed and extended. The following article is a response to this question and also attempts to answer some of the philosophical dilemmas noted above, in light of these new mathematical methods. The answers this article gives are preliminary, but show in some detail the power of these methods in answering open historical questions regarding the accuracy of historical cartography and the complex considerations that historians make regarding a map’s geographic sources. The conceptual and mereological questions regarding geographic representation discussed in the paper also attempt to provide a sound theoretical foundation for further studies in what the author terms the newly emerging field of “Analytic Historical Cartometry”. 1 Edmund Husserl, Crisis of European Sciences and Transcendental Phenomenology: An Introduction to Phenomenological Philosophy, trans. David Carr (Evanston, Ill.: Northwestern University Press, 1970) 355. 2 Achille Varzi, “Introduction: Philosophical Issues in Geography”, Topoi, 20 (2001), 119-130, “Vagueness in Geography”, Philosophy & Geography, 4 (2001), 49-65, and “Fiat and Bona Fide Boundaries” (with Barry Smith), Philosophy and Phenomenological Research, 60 (2000), 401-420. 3 Mark Monmonier, “Cartography: the multidisciplinary pluralism of cartographic art, geospatial technology, and empirical scholarship” Progress in Human Geography, 31 (2007) 371-379. 4 John Hessler, “Warping Waldseemüller: A Phenomenological and Geometric Study of the 1507 World Map,” Cartographica, 4 1(2006) 101-114. 1 Introduction: The Use and Abuse of Accuracy It must be remembered that mathematics is no more abstract or schematic than ordinary science, functions suitable for the theory of functions really occur in nature. Mathematics is not confined to ultimate objects5. The question of accuracy in historical cartography has always been a vital one and according to J.B. Harley6 it is also one of the least understood. The word accuracy is frequently employed by historians of cartography with respect to early maps, both when discussing their creation, and, in the analysis of their historical use. The word however is almost always misused because of a simplistic understanding of statistical reasoning, transformational mathematics, and the unqualified meaning that is sometimes attached to the words “accurate” and “inaccurate”, adjectives that cannot easily be applied to objects as complex as historical maps. Accuracy is really composed of two independent parts. One is relative, and associated with internal consistency. The other is absolute and concerned with external correspondence. Relative accuracy is a measure of the error of individual features as they occur on the same map. Absolute accuracy, on the other hand, is a measure of the error of the locations and the geometrical shape of what is displayed on the map compared with known locations and shape on the actual earth. 5 Frank Plimpton Ramsey, Notes on Philosophy, Probability and Mathematics (Naples: Bibliopolis, 1991) 56. 6 J.B. Harley and M.J. Blakemore, “Concepts in the History of Cartography: A Review and Perspective,” Cartographica 17 (1980): 54 2 The absolute accuracy of early maps has been a theme in cartographic literature probably since the method of distortion grids was invented in the 1840s. This method examines the distortions in two surfaces by comparing the grids derived from an arbitrary metric system on an early map and the modern equivalent of the same geographic area. Nicolas Tissot in his famous Memoire of 1881 published one of most important early characterizations of accuracy and distortion7. The main contribution of Tissot consists in the development of the theory of the geometric deformation indicator that is termed the Indicatrix. The Indicatrix is an ellipse drawn on the map whose shape varies due to bi-dimensional changes in distortion when transforming two surfaces. Although it has been used mostly in the study of the deformation induced by projections it has also been applied to distortion calculation on early maps. More recently, with the advent of faster algorithms for the solution of partial differential equations, the ability to efficiently solve more complex comparative transformations, the availability of Geographic Information Systems (GIS), and the emergence of specialty software packages such as MAPANALYST8, the appearance of studies on the accuracy of early maps has begun to accelerate. Studies by Hessler9, Hu10, Fostner11, 7 Nicolas August Tissot, Memoire sur la representation les surfaces et les projections des cartes géographiques. (Paris: 1881). 8 http://www.ika.ethz.ch/mapanalyst 9 John Hessler, “Warping Waldseemuller: A Phenomenolgical and Geometric Study of the 1507 World Map,” Cartographica (forthcoming June 2006) 10 Bangbo Hu, “Assessing the Accuracy of the Map of the Prefectural Capital of 1261 Using Geographic Information Systems,” Professional Geographer 53 (2001): 32-44 11 G. Fostner and M. Oehrli, “Graphische Darstellungen der Untersuchungs ergebnisse alter Karten und Entwicklung der Verrungsgitter.” Cartographic Helvetica 17 (1998): 35-43 3 Bretterbauer12 and Fuse13 are just a few that have appeared recently. The increased application of these mathematical tools has opened up a space for an experimental history of cartography that concentrates on a more analytic form of accuracy based on rigorous statistical and transformational methods. Yet despite the increase in statistical studies there is no general consensus on the definition and theoretical characterization of accuracy in the study of early cartographic materials nor are there any critical studies that review and explain the strengths and shortcomings of the various available techniques. The calculation of accuracy and geometric similarity on early cartography is multifaceted and complex, “for as maps are the end product of a chain of processes, so it is expected that several distinct types of accuracy may have to be accepted within a single map.”14 The error on maps comes from a variety of sources and there have been many attempts to categorize and classify them. MacEachern in his How Maps Work, for example uses three categories of error in his study of method-produced error: (1) interpolation error, (2) generalization error and (3) classification error15. A. Chrisman takes a more general approach to cartographic error citing three broad categories: (1) systematic, (2) random, and (3) blunders16. 12 K. Bretterbauer, “Zur Konstruktion von Verzerrungslinien mittels der multiquadratischen Methode von Hardy.” Kartograpische Nachricten 2 (2005): 83-85 13 T. Fuse and E. Shimizu, “A Study of the Geometric Correction of Historical Maps,” International Archives of Photogrammetry and Remote Sensing 32 (1998) : 545-548. 14 J.B. Harley and M.J. Blakemore, 55. 15 A.M. MacEachern How Maps Work: Representation, Visualization and Design (New York: Guilford Press, 1995) 16 N.R. Chrisman, “Speaking Truth to Power: An Agenda for Change” in K. Lowell (ed.) Spatial Accuracy Assessment: Land Information Uncertainty in Natural Resources. 27-21 4 In their important monograph on concept formation in historical cartography Harley and Blakemore identified three types of accuracy that are important to the study of early cartographic materials, (1) chronometric, (2) geodesic and planimetric, and (3) topographic. Chronometric accuracy is composed of two related parts, (1) the date of the map’s production, and (2) the date of the geographic information that it displays. Geodesic and topographic accuracy are categories that take into account the difference in metric distances between geographic locations on early and modern maps, the arrangement and shape of the various spatial entities displayed on the map, and differences in distinctive landscape features that may have changed over time. Accuracy of any type is especially difficult to define and quantify when dealing with early maps of small scale such as world maps or Portolan Charts. The geographic sources used to create the map may no longer be extant, may have had varying scales and different degrees of geodetic accuracy, and may themselves have been made from composite sources. In characterizing the accuracy of historical maps we also face contextual questions that influence our determination of how accurate an early map could be. What level of accuracy can we expect from the mapping and surveying technology of the period? Was the map meant to be an accurate representation of the current geodetic knowledge of the period? J.B. Harley summarizes the problem as follows: “Accuracy is relative rather than an absolute concept as far as cartography is concerned. It cannot be endowed with a rigid definition such as ‘exact conformity to truth’ or ‘free from error or defect’; not 5 only are maps deliberate generalizations of reality, ‘representative models of the real world’, but all survey and map production processes inevitably introduce error at some stage. [What the historian of cartography] should be concerned with is a systematic study of factors affecting error, and seek to establish their cause and variability and the statistical parameters by which error is characterized.”17 Harley in his The Evaluation of Early Maps: Towards a Methodology18, proposed three types of analysis that would help in assessing the accuracy of early cartographic materials. The first and the only one that we shall concern ourselves with here, is an analysis of the map itself, investigating its mathematical and topographical structure. This methodology also includes the comparison of the map in question with maps that are contemporary with it and an investigation of its accuracy compared to modern equivalents. Accuracy as characterized by Harley is essentially a syntactic and not a semantic question. It is a structural and logical concept and hence can theoretically be measured and quantified to a certain probability both locally and globally on any historical map. Accuracy is phenomenological and formal in that provides a de-contextualized relational structure through which we may compare maps of any era. This is not to say that semantic questions, those dealing with the meaning of the accuracy or inaccuracy of a particular map, do not provide a relational structure. It is merely to say that the formalization of semantic questions is much more difficult and not quantifiable in the same way, as for example, a question of variation in scale might be. 17 18 J.B. Harley, Ordnance Survey Maps: a descriptive manual (Southampton: Ordnance Survey, 1975) J.B. Harley, “The Evaluation of Early Maps: towards a methodology” Imago Mundi 22 (1968): 62-74. 6 An understanding of accuracy in this syntactic sense requires an understanding of the mathematics of measurement and of the statistical properties of coordinate transformations that can be applied to the analysis historical maps. The questions of “How do we measure, extract and quantify information from historical maps?”, and “How do we compare it to their modern equivalents?” are our key concerns. The answers to these questions are essentially problems in the statistical characterization of error in historical measurement processes. These processes may be unknown and difficult to characterize in any historical sense. Transformational techniques provide a phenomenological approach to these processes and provide a statistical baseline for comparison of cartographic materials that is not dependent on causal determinations. David Harvey in his influential Explanation in Geography approaches error and measurement by saying that “…the interesting thing is that in order to measure effectively we require deep analytical understanding and considerable thought regarding the controls necessary and the errors naturally incurred. Measurement may not be a satisfactory end in itself, but we can be sure that in pursuing this end we will turn up problems and difficulties, the solution of which will provide major advances to our understanding. This depends however on a sound understanding of the nature and principles of measurement. Without such an understanding we are plainly lost.19” Harvey’s call for us to acquire a “deep analytical understanding” of measurement is important if we are to learn anything of historical interest from the mathematical methods that are used in calculating accuracy in historical cartography. While these methods are well 19 David Harvey, Explanation in Geography (New York: St. Martin’s Press, 1969) 7 defined mathematically, their interpretation requires a more critical understanding of the processes that produce error and of the statistical measures of accuracy that are applied historical cartography. In relation to this we must consider the questions: (1) How do mathematical coordinate transformations work? (2) Why do we use one type transformation for this map and another for a different one? (3) Are the correlations deterministic or do they have significance tests? (4) Do these results have any historical import or causality? Before we approach the mathematical and statistical questions with any degree of rigor and generality there are two qualifying concepts that must be considered relative to any historical map that we might consider studying. First, “Can we make a judgment on the intent of the cartographer as far as accuracy is concerned?” We must take pause and consider whether or not the map that we are interested in was meant to be an accurate representation of what it visualizes. Second, “How geometrically stable is the medium on which the map is presented?” In applying modern mathematical techniques to digital forms of historical maps this question is important because paper and vellum expand and shrink. The distortion caused by these processes effects not only our calculations but may also obscure the cartographer’s intent. The ability to restore digital images makes it possible to account for the curling, expansion and shrinking of paper and vellum in historical cartometry and is currently an active area of research. 8 Maurizio Pilu20, for example, has used the technique of applicable surfaces to account for curl distortion of scanned images. A surface is considered applicable surface when its Gaussian curvature21 vanishes at every point. Applicable surfaces are isometric with the plane and can be flattened without tearing and thereby allow the accurate modeling of paper deformation. German Diaz and Patricia Steed22 have used GIS software to restore damaged and distorted maps to a high degree of accuracy using polynomial techniques that are akin to those used for georectification. Their techniques allow us to digitally repair and realign map features and make transformational calculations more accurate. The following paper seeks to provide a critical survey of the mathematical and analytical methods available to the cartographic historian and to provide an in-depth theoretical and interpretive foundation for such methods. We shall discuss the linear and nonlinear transformations of planar surfaces that provide comparative correlations of scale, quantification of error, and that can be used to model vector and tensor displacements. We will examine the mathematical formulations of several coordinate transformations, such as Helmert’s, provide an overview of new statistical methods, such as those developed by Huber23 and Hampel24, and consider theoretical 20 Maurizio Pilu, “Undoing Paper Curl Distortion Using Applicable Surfaces”, International Conference on Computer Vision (ICCV’01), Vancover, July 2001. 21 M.P. Do Carmo, Differential Geometry of Curves and Surfaces (New York: Prentice Hall, 1976) 22 German Diaz and Patricia Steed, “Electronic Restoration: Eliminating the Ravages of Time on Historical Maps” ICADL 2005 (Berlin: Springer Verlag, @005) 23 P.J. Huber, “Robust Estimation of a Location Parameter,” Annals of Mathematical Statistics 35 (1964): 73-101 and Robust Statistics (New York: John Wiley, 1981) 24 F. Hampel, Contributions to the Theory of Robust Estimation, PhD Thesis (Berkeley: University of California, 1968). 9 questions relating to the use of bi-harmonic equations and thin-plate splines in surface warping calculations. The hope is that by explaining the mathematical and theoretical details of early and modern methods useful in the statistical comparison of cartographic materials, more cartographic historians will be inspired to experiment with the problem of accuracy and to use transformational tools to study historical maps with more analytical rigor. Even though in this paper I shall assume some knowledge of linear algebra, partial differential equations, and elementary statistics on the part of the reader, those unacquainted with them should have little trouble following the qualitative aspects of the discussion and, in doing so, learn to critically evaluate historical studies that employ these methods. 10 II. Revisiting Bi-dimensional Regression and Related Techniques A geometry is the study of those properties of a set S which remain invariant when the elements of S are subjected to the transformations of some transformation group25. Any discussion of the mathematical and statistical comparison of ancient and early maps with their modern equivalents must necessarily begin with the work of Waldo Tobler. Tobler produced a series of seminal papers beginning in 1965 with the “Computation of the Correspondence of Geographical Patterns” and ending with his 1977 paper and computer program “Bi-dimensional Regression” that presented and developed the theory of statistical map comparison26. Bi-dimensional regression, essentially invented by Tobler, was a statistical regression technique that allowed inferences to be made between two planes from point distributions on those planes. The comparison between points on the two planes was essentially a coordinate transformation much in the way a map projection transforms coordinates from a datum sphere to a planar map. Of the many techniques used to analyze point patterns, Tobler’s regression is unique in requiring that there be a one-to-one mapping between the 25 26 A. Tuller, An Introduction to Geometries (Princeton: Princeton University Press, 1966) 70. Waldo Tobler’s papers on the development, theory and applications of bi-dimensional regression include, “Computation of the Correspondence of Geographical Patterns”, Papers of the Regional Science Association 15 (1965): 131-39; “Medieval Distortions: The Projections of Ancient Maps”, Annals of Association of American Geographers 56 (1966): 351-61; “Bi-dimensional Regression”, reprinted in Geographical Analysis 26 (1994): 187-212. 11 points on the planes27. The requirement that the mapping function (the equations that transform one map to another) is injective assures that there is a correspondence between points on the two planes to be compared and that both surfaces are topologically genus-0, having no holes or singularities. Tobler’s models allowed him to draw distortion grids, to calculate the differences in scale, and to analyze the metric error of maps comparatively, though the interpolation of smooth vector and tensor fields. Tobler defined and investigated four kinds of bi-dimensional regression models: (1) the Euclidian transformation model, (2) the affine transformation model, (3) the projective transformation model, and (4) the curvilinear transformation model. The great utility of these models is that although they can be considered to be elementary applications of empirical differential geometry or as studies of nonlinear transformations in two-dimensional Euclidean space, they are defined by the maps that are to be studied rather than by abstract a priori properties. Tobler thought that his curvilinear lattice model was of the greatest interest because the regression coefficients constitute a spatially varying, but coordinate invariant, second-order tensor field that was defined by the matrix of partial derivatives of the transformation28 between the two sets of points. In his computer program he used a nonparametric approach that allowed visualization of the regression by automatically plotting scatter diagrams, drawing a 27 Tomoki Nakaya, “Statistical Inferences in Bi-dimensional Regression Models”, Geographical Analysis 29 (1997); 169-185. 28 H. Akima, “On Estimating Partial Derivatives for Bivariate Interpolation of Scattered Data”, ACM Transactions on Mathematical Software 9 (1984): 41-52 12 displacement field, and producing a differential interpolation of warped coordinates that was essentially equivalent to Tissot’s Indicatrix. In bi-dimensional regression we compare two planar surfaces by looking at the transformation rules between the two planes based on point distributions. Figure 1 shows the nomenclature that will be used in this paper to describe such transformations. In the figure Z is the explanatory plane whose scale and accuracy is well known (i.e. the modern map) and W is the target plane (early map) whose accuracy is being investigated. The general form of such a transformation is given in matrix notation by the expression  u i   f ( xi , y i )   =  v   g( x , y )    i  i i  The (ui, vi ) denotes each 1….i point on the target plane (dependent variable) and the (xi , yi) are corresponding points on the explanatory plane. The functions f and g are the transformation equations that describe the mapping transformation from Z to W. Bi-dimensional regression as created by Tobler is a technique that estimates the transformation function parameters based on a least squares minimization and a goodness-of-fit measure defined as the bidimensional correlation coefficient R: R= ∑ {( u 1− ∑ {( u i i i ˆ 2 ˆ 2 − u i ) + ( vi − vi ) − ui ) 2 2 i i i } + (v − v ) } 13 ˆ ˆ where u i and vi are predictors of u i and vi , and u i and vi are the averages of the two values. In practice this means is that we are attempting to minimize the function 1 N ˆ ∑(W − W ) N k =1 2 = 1 N ∑(W − f ( Z ) ) . k =1 N 2 The least-squares equation above is solved by taking partial derivatives with respect to the parameters of the function f(Z), setting them equal to zero, and then solving the resulting system of equations. N is the number of points, called tie points, on the two planes that have been selected and linked together. These are the points whose error we want quantify by transforming their locations from the explanatory image to the target image. In order to find the minimum of this equation and hence the error between the two planes or maps we must chose a model by deciding on the form of the function f(Z). This function tells us the exact nature of the transformation and the parameters that we need to calculate. The form of these functions can be conveniently categorized as either linear or nonlinear and it is their selection and manipulation that is the central difficulty in applying transformations to historical cartography. Figure 2 shows the hierarchy of transformations that we shall discuss in this paper. 14 15 Linear Models Translation and Scaling Translation and Rotation M-estimators Affine Projecti ve Nonlinear Models Parametric Nonparamet ric Bivariate Polynomial Radial Basis Functions Elastic Harmonic Functions Thin Plate Splines Figure 2: Hierarchy of Transformation Models Discussed in this Paper 16 A. Linear Models Wherever possible, logical constructions are to be substituted for inferred entities29. Of Tobler’s four models, three are linear transformations. These are the Euclidean, the Affine, and the Projective transformation. The geometrical meaning of linear is that the straight lines that are on the explanatory map are transformed as straight lines on the target map. A Euclidean transform again in matrix form is defined by  ui   α 1   β 1   = +  v  α   β  i  2  2 − β 2  xi    β 1  y i    where αi and β are the constant parameters that need to be i determined. Euclidean transformation calculations result in images that are geometrically similar to the explanatory image. The transformation results in similarity because the parameters only allow scaling, rotation and translation. The model is often written in a trigonometric form known as a Helmert transformation  ui   α1   cos ϕ   =   + m  sin ϕ  v  α    i  2 − sin ϕ  xi    cos ϕ  y i    where m is a scaling parameter and ϕ represents an angle of rotation. The Helmert Euclidean models are extremely efficient to calculate and produce acceptable correlations if the target map does not suffer from too much of unilateral distortion and the data points are not too irregularly spaced. Euclidean transformations consist in a translation that brings the mean locations into coincidence, rotates the principle 29 Bertrand Russell, On Scientific Method in Philosophy (1914) 17 axis by some angle about this location, and produces a uniform change in scaling. This brings the dependent and independent variables into the best coincidence based on simple rigid motions. The affine transformation model is more complex, having 6parameters instead of four, and transforms the explanatory image in a way that results in a dissimilar image from the original. The additional parameters allow for different scaling in each of the two dimensions and for shear in the target image. The affine transformation can be expressed  ui   α 1   β 1  = +  v  α   β  i  2  3 β 2  xi    β 4  y i    Bookstein characterized the transformation effect by saying that an orthogonal pair of directions in the x-y image remain orthogonal in the u-v image and the transformation either stretches or shrinks in these two directions30. These affine transformations can be used when the target map has shear associated with it compared to the explanatory image. An affine transformation for one of the sheets of the 1507 world map by Martin Waldseemüller is shown in Figure 5a. Although these transformations are easy to calculate there are several problems associated with the statistical veracity of the results of which the cartographic historian must be aware. First, we notice that the regression models do not contain an error term whose statistical properties are specified. A term such as this would specify the form of 30 F.L. Bookstein, Measurement of Biological Shape and Shape Change (Berlin: Springer-Verlag, 1978) ch. 6. 18 the error distribution (Poisson, Gaussian et.al.) and would have to be inserted into the model in the form of an error vector. This would appear as an additive vector as in the affine model,  ui   α 1   β 1  = +  v  α   β  i  2  3 β 2  x i   ε i    +   . β 4  y i  η i      Nakaya adopted a normal Gaussian distribution for the error in his study of cognitive maps and used it to produce significance tests for the results of bi-dimensional regressions and to confirm the formal existence of such models31. Tobler’s models have outcomes that are merely descriptive and it is impossible to formally confirm which transformations should be considered as statistically meaningful. A second, and more serious problem, are what are known as outliers. Outliers are points on the target map whose positional error is statistically much larger when compared to the other points. They are especially important to identify in the cases where the selected tie points do not form a regular grid but instead are randomly scattered over the map surface. The effect of these points on the regressions needs to be measured and controlled so that they do not have a larger effect on the correlations than other points whose error is closer to the mean. Techniques for the identification of these points are an active area of research in statistical theory and have spawned the creation of what are known as Robust Estimators. Statistical inferences of the type we are making in historical cartography are based only partly on the selection of the linked points 31 Nakaya (1997). 19 in our studies and the selection of the mapping equation f(Z). Equally important are the underlying assumptions that we make concerning the behavior of the error distributions and their stability. In the models we have talked about so far we have made certain a priori assumptions about the error that we are investigating in the comparison of our two surfaces. Using these linear transformations we can calculate distortion grids, scale and rotation isolines, and the error vectors at each tie point we have selected, but we have yet to into account the stability of these calculations to perturbations caused by points with large errors. What this means in practice is that we have assumed an error distribution among the tie points in which each individual point has an effect on the overall calculation equal to that of the others32. But what if we encounter points of extreme error? Gross errors often show themselves as outliers but not all points that are outliers are gross errors. In the analysis of historical maps a group of outliers may be the most important points in the sample that is being studied. It could be a series of points where the scale on the historical map abruptly changes and thereby signifying the use of a different source by the map’s compiler. Outliers are therefore an ill-defined concept, without clear boundaries. Nevertheless outliers are an extremely useful and important concept as long as we recognize that there is a continuous transition between what might be considered ordinary in the sample and what might be thought of as a point of high error. We need to carefully examine outliers for their meaning and question whether or 32 V. Barnett and T. Lewis. Outliers in Statistical Data (New York: John Wiley, 1994). 20 not they are truly errors that should be dismissed from the sample used in the model calculations. The Least-squares error minimizations that we have discussed for use in historical cartography can behave badly if the error distribution is not normal, particularly when the points are irregularly spaced or if a concentration of error occurs far from the mean33. Several types of estimators have been developed to account for points of large error and to control their effect on the least-squares estimates. These estimators serve to confine local geometric effects of the error points by introducing into the calculations a weighting function. There are many types of estimators but we will confine our discussion to a class of estimators called M-estimators that where first developed by Huber34 in the 1960s. Earlier we defined the least squares problem in historical cartography as the minimization of the residual 1 ri = N 2 ∑ (W − f ( Z ) ) i =1 N 2 , where ri 2 is the residual or the error of the ith data point. The standard least squares method tries to minimize this value for all of the tie points that we have selected and smoothes that value for the entire surface of the target image. This method of calculation can become unstable if the error to be minimized is not normally distributed. Outlying data can sometimes so strongly affect the outcome of this 33 Peter Rousseeuw and Annick Leroy, Robust Regression and Outlier Detection (New York: John Wiley, 1987) 34 Huber (1964). 21 minimization that the estimated parameters become severely distorted. This effect is especially important to account for in data sets containing points that are irregularly spaced. In most applications to historical cartography the irregular spacing of points comes about because in practice we generally cannot pick points of known geographical location on both the explanatory and the target image along a regular grid. The tie points are therefore randomly distributed due to the very uncertainty that we are trying to quantify in our calculations. Hence we are left with an irregularly spaced grid of points whose error and spatial relationship to one another are unknown. M-estimators are part of a large family of estimation functions that try to reduce the effect of outliers in our calculations by replacing the squared residuals least squares function by another function of the residuals min ∑ρ( r ) , i i where ρ is a symmetric, positive-definite function with a unique minimum at zero. This function is chosen to be less increasing as the error in our residuals goes up than the typical least-squares calculation. Intuitively, it is desirable for the error of distant points to have less influence than those which are closer to a particular point in question. One could imagine this as an application of Tobler’s first law of geography, “everything is related to everything else, but near things are more related than distant things.35 The new function serves to 35 W. Tobler, “A Computer Movie Simulating the Urban Growth in the Detroit Region,” Economic Geography 46 ( )234-40. 22 confine local geometric effects, and prevents them from averaging over the entire range of the calculation. To show how this is accomplished we let p = [ p1 ,...., p m ] T be the parameter vector to be estimated on the target map. The M-estimator of p based on the function ρ( ri ) is the vector p which is the solution of the following set of m equations: ∑ψ ( r ) ∂p i i ∂ri j = 0, for all j = 1,…,m, where the derivative ψ ( x) = dρ ( x ) dx is called the influence function. If we now define a weight function w( x ) = ψ ( x) x the equation for the M-estimator becomes ∑w( r )r i i i ∂ri =0 . ∂p j This equation is exactly the system of equations that we obtain if we solve the iterated re-weighted least-squares problem min ∑w(r )r , k− 1 i i i where k indicates the iteration number. The influence function measures the influence that a particular data point has on the calculation of the parameters of the estimate of the point on the target image. For example, in the linear models that where discussed earlier, the influence function measures how much 23 influence a point of very high rotation might have on the overall rotation of the target map. Of the many influence functions available in the statistical literature perhaps the most useful is known as Huber’s function shown below.  r2    u     ρ ( r) =  2 2  , ψ (r) =  , k  sgn ( r ) ⋅ k  k ⋅ r −  2   1   w( r ) =  k  r    r  k Where the first function in each of the brackets is for second is for values r ≥k and the . Huber’s function (figure 3) is a parabola in x  k. the vicinity of zero, and increases linearly at a given level The k is called a tuning constant and it adjusts how close to maintaining a standard normal error distribution the minimization will stay. Figure 3: Huber Estimator 24 MAPANALYST36, a software program for the comparison of maps, allows for use of the Huber robust estimator and recommends a value of 1.5 for k. An asymptotic efficiency of 95% is obtained with the constant at k= 1.345 and might be a better starting point for maps with few outliers. Huber’s function has been found satisfactory in many applications and is the most practical for use historical cartography because it has only one parameter to adjust and the effects on a particular regression can be easily experimented with. The choice of other functions is possible but there are several constraints on the type of functions used. The most important is that the influence function must be bounded and that the estimator has to be unique. This implies that the objective function of the parameter p to be minimized should have a unique minimum, requiring that the individual ρ function is a convex in variable p. The convexity restraint is equivalent to imposing that ∂2 ρ( r ) is nonnegative definite. ∂p 2 Estimators vary in there power to identify and control outliers and several have the power to eliminate far outliers from the calculations altogether. Hampel developed one such estimator that is also used in MAPANALYST37. It uses three variables a, b, and c, in the parameter, influence and weight function and is known as a redecending estimator. The form of Hampel’s estimator is more 36 37 developer F. Hampel, “The Influence Curve and its Role in Robust Estimation,” Journal of the American Statistical Association 69 (1974): 383-393. 25 complex and not intuitively understandable. The equations are given below:   r2   2   a2   a⋅ r −   2   2 ρ ( r) =  a2 a   c − r      ab − + ( c − b ) 1 −  2 2   c − b          2 a a   ab − + ( c − b )    2 2  r    a ⋅ sgn( r )    ψ (r) =  c− r  a ⋅ sgn( r ) ⋅  c−b   0   1   a     r   w( r ) =  a c a . −  ⋅   r c − b c − b   0   where the four equations in each bracket are defined over r  a, a ≤ r  b, b ≤ r  c, and u ≥c respectively. The Hampel function is shown graphically below in figure 4. 26 Figure 4: Hampel Estimator The parameters of the Hampel estimator can be adjusted so as to define a split point at b in the error distribution with a being equivalent to k in the Huber estimator and c defining the error point beyond which the data points will not be considered. This function is very useful for large irregular point distributions on small-scale maps. Many smallscale maps have areas of vastly different scale and accuracy. Geographic regions not close by an area of interest can be effectively eliminated from the regressions thereby allowing for the calculation of better local correlation coefficients. 27 B. Non-Linear Models The contribution of reason is not expressed by the fact that the system of coordination contains unchanging elements, but in the fact that arbitrary elements occur in the system38 Nonlinear transformations are more general than their linear relatives and as such are slightly more complex. These transformations allow not only for rigid rotations, translations and scaling, but also allow curvature to be induced into the target map surface. Tobler considered several types of curvilinear transformations that are nonlinear in his original paper. He began by considering the fact that the differentiable transformation, Z ⇒ W , in our nomenclature from figure 1, can be approximated locally for two surfaces by an affine transformation of the form ∂u ∂u dx + dy ∂x ∂y . ∂v ∂v dv = dx + dy ∂x ∂ y du = The matrix of β coefficients in the linear case is then by replaced by the Jacobian matrix ∂u  ∂x   ∂v  ∂x  ∂u  ∂y   ∂v  . ∂y   Using differential coefficients in the transformation effectively means that every nonlinear transformation is treated as a conjunction 38 Hans Reichenbach, Theory of Relativity and A Priori Knowledge (Los Angeles: University of California Press, 1965) 88-89. 28 of infinitely many local transformations.. These types of transformations are related to Tissot’s Indicatrix in that we are actually defining a second-order tensor made up of the partial derivatives of the components of the transformation39. There are many possible curvilinear models that would fit these transformations but Tobler required that they be restricted to functions that are continuous and one-to-one. The condition of being one-to-one means that for each point on the explanatory map only one point exists on the target image. This is equivalent to the stipulation ˆ ˆ ˆ ˆ ∂u ∂v ∂u ∂v − =J  0. ∂x ∂y ∂ ∂x y The least squares problem then becomes a question of minimizing the function ri 2 = ∑ ( u i − f ( xi , y i ) ) + ( vi − g ( xi , y i ) ) , 2 2 i =1 K [ ] subject to the condition that J  0 . Tobler dismissed any parametric formulations of nonlinear transformations due to what he saw as stability problems and the inability to guarantee the condition that the mapping would be one-to-one. There are however several parametric forms that have been successfully applied in historical cartography40. One such parametric model is polynomial warping. The process of polynomial warping is essentially a mathematical transformation or mapping from a distorted image, such as an early map or a map with an unknown scale or geometric grid, to an explanatory image that is 39 P.H. Laskowski, “Map Distortions and Single Value Decomposition”. Technical Papers, ASPRS-ACSM Annual Convention 4 (1987)42-51. 40 Hessler (2006). 29 well known. The objective is to perform a spatial transformation or warp so that the corrected image can be measured or have a metric placed upon it relative to a known map or grid. The mathematical functions used in the process are polynomials of an arbitrary order that depend on the amount of distortion in the unknown map. The warping process includes all deformations that can be modeled by global bivariate transformations of the form41: u = ∑=0 ∑ =0 a ij x i y j i j N N −i v = ∑=0 ∑ =0 bij x i y j i j N N −i An example of a second-order or quadratic transformation using the above equations is written as u = a 00 + a10 x + a 01 y + a11 xy + a 20 x 2 + a 02 y 2 v = b00 + b10 x + b01 y + b11 xy + b20 x 2 + b02 y 2 The constant coefficients aij in the above polynomial equations can be associated with particular types of distortion as shown in Table 1. Table 1 : Polynomial Coefficients and Related Distortion a00 shift in x b00 shift in y a10 scale in x rotation b01 scale in y rotation a01 b10 a11 b11 a20 41 sheer in x sheer in y y-dependent scale in x x-dependent scale in y nonlinear scale in x rotation rotation George Wolberg. Digital Image Warping. (Los Alamitos: IEEE Computer Society Press, 1990), 30 b02 amn….bnm nonlinear scale in y higher order non-linearity The values for the polynomial coefficients are found by the use of tie points that, as in the linear cases, represent corresponding positions in the known and distorted image whose locations are geographically equivalent. Polynomial methods can be either interpolating or approximating functions depending on the number of tie points selected. An interpolation condition exists when the number of tie points equals the number of unknown terms in the polynomial expression. In this case the system of equations is said to be fully determined. When the number of tie points is greater than the number of unknown terms the transformation coefficients have to be approximated. This situation represents an over determined system of equations and is situation that most often arises in historical cartography. Because the number of polynomial coefficients to be calculated is much less than the number of tie points in any reasonable order polynomial warp, a least-squares error minimization is used to approximate the difference in the two images42. Polynomials of high degree can be unstable, producing unwanted stretching and warping that interfere with any extrapolations. The inclusion of higher-order terms in the transformation however, does allow for much more flexibility and for the incorporation of more image 42 Sung Joog Ahn. Least Squares Orthogonal Fitting of Curves and Surfaces in Space. (New York: Springer-Verlag, 2004). C. Lawson and R. Hanson. Solving Least Squares Problems. (Englewood Cliffs: Prentice Hall, 1974) 31 distortion than do lower order forms. The coefficients for a secondorder transformation can be determined by minimizing E = ∑k =1 δ k2 M = ∑k =1[U ( x k , y k ) − u k ] 2 M 2 2 = ∑k =1[ a 00 + a10 x k + a01 y k + a11 x k y k + a 20 x k + a 02 y k − u k ] 2 M Minimization is achieved by determining the partial derivatives of E with respect to coefficients aij, and equating them to zero. For each coefficient aij, ∂δ k ∂E M = 2∑k =1 δ k =0. ∂a ij ∂a ij By solving for the partial derivatives of E with respect to all six coefficients, we obtain a 6 x 6 symmetric system of linear equations, whose coefficients are summations from k =1 to M, and can be evaluated from the original tie points. This system of equations can be written in compact form as ∑ ∑ N i =0 N −i j =0 aij [∑ M k =1 i i l m l m x k y k x k y k = ∑k =1 u k x k y k . M ] Because of the instability associated with large order polynomials they are not useful for transformations of maps with large amounts of distortion. This is especially true with irregularly and distantly spaced tie points where large oscillations in the calculations can occur. The other problem associated with polynomial expansions is that fact that all of the tie points are used to construct a single distortion model. We saw the limitations of this global method in the 32 linear models without estimators. The problem can be accounted for in polynomial expansions by using weighting functions in the leastsquares minimization. A polynomial least-squares method can be localized by introducing a weighting function Wk that represents the contribution of a point ( x k , y k ) on the explanatory image to (u, v) on the target image. Wk = (δ + ( u − x ) k 1 2 + ( v − yk ) 2 ) 1 2 The parameter δ determines the influence of distant control points on the approximated points. As δ approaches zero, the distant points have less influence and the approximate mapping function follows the local explanatory points more closely. This serves to smooth the mapping function by making it approach the results of a standard least-squares method. For ordinary polynomials the system of equations becomes M M i i m l m a ij ∑Wk ( x, y ) x k y kj x k y k  = ∑Wk u k x k y k . ∑∑  k =1 i =0 j =0  k =1 N N −i A third order polynomial warp was performed on a single sheet of Waldseemüller’s 1507 world map and is shown in Figure 5. Tobler’s original formulation of his curvilinear bi-dimensional model was more complex than the polynomial expansions and relies on a nonparametric approach. In modern terminology it is part of a 33 family of methods known Radial Basis functions that also include Thin Plate Splines43 and Multiquadric-biharmonic theory44. Tobler’s calculations begin by selecting a lattice or grid of some size made up of squares of equally spaced points and laying it over the geographic region of interest. A grid is an important element in the solution of partial differential equations by the finite difference methods that Tobler employed. Finite-difference methods substitute finite quantities for the infinitely small quantities that appear in the differential equations. When Tobler was writing his original paper in 1977 the techniques for grid generation were not as well developed as those today and he was limited to the use of square boundaries45. 43 I.D. Barrodale, D. Skea, et.al. “Warping Digital Images Using Thin Plate Splines and Control Point Matching,” Pattern Recognition 26 (1993): 375-376. 44 R.L. Hardy, “Theory and Applications of the Multiquadric-biharmonic Method” Computers and Mathematical Computing 19 (1990): 163-203 45 Vladimir Liseikin, Grid Generation Methods (New York: Springer-Verlag, 1999) 34 Figure 5: (a) An Affine Transformation (b) Third-Order Polynomial Warp In modern terminology he defined a structured lattice with a clearly defined rectangular boundary46. There are three groups of grid generation techniques that are used in the solution of differential equations: (1) algebraic, (2) differential that are based mainly on the solution of elliptic, parabolic or hyperbolic functions in a selected transformed area, and (3) variational methods that are based on optimization techniques. How one generates the lattice is always a balance between the function to be minimized and interpolated and the smoothness of the lattice. ˆ Initially, a lattice is selected for both of the functions u = f ( x, y ) ˆ and v = g ( x, y ) . Once a grid has been placed over the area of interest, linear interpolation is used within each of the cells of the grid to assign ˆ ˆ points in the interior. The transformation function ( x, y ) ⇒ ( u , v ) is therefore defined everywhere within a frame slightly larger than geographical region of interest. Technically this lattice is larger than 46 Guo Chen, Zhilin Li and Ping Liu, “A Fast Finite Difference Method for Bi-harmonic Equations on Irregular Domains”, Center for Research in Scientific Computing Report 04-09, North Carolina State University, 2005. 35 the convex set of points that define the region of interest on the maps. ˆ ˆ The assignment of values to u and v to the nodes of the mesh can be done in a variety of ways47. Tobler’s method assigns values by solving partial differential equations based on what he terms “observations” to obtain a smooth differentiable mapping. These observations are the locations on the target image. Once values become assigned to the nodes of the mesh, these values along with the linear bivariate interpolation, define the mapping function. In this way Tobler has defined a simple model in which the only parameters are the mesh size, the numerical quantities at the nodes, and the boundary assignment. Tobler uses a two part iterative process to “tune” the lattice and to constrain the points predicted in the transformation. The first part of the iteration makes the estimates agree as nearly as possible with the “observations” and the second part specifies a smoothness function. To understand how the first part works I quote from Tobler, “…Consider a single observation contained inside of a cell of the mesh. Assume some (any) numerical values for the four nodes of this cell. Use the four values to compute (by linear interpolation) an estimate at the same location as the observation. The numerical estimate will, in all probability, differ from the value of the actual observation.” The values of the nodes change during the computations to accommodate observations and the iterations will change the values on neighboring nodes as they accommodate observations in adjacent cells. Nodes that are associated with empty cells and lacking in observational points must be handled differently, 47 Liseikin (1999). 36 “It is this lack of knowledge and lack of uniqueness of adjacencies that makes interpolation from randomly scattered observations difficult. Thus the nodes tuned to the observation act as constraints on the unconstrained ones. This may be visualized as follows. After an initial assignment has been made for each mesh point, replace this number by the value that it would receive by interpolation with its immediate neighbor. Do this in some regular sequence so that the value after an interpolation is made at one point this interpolated value is used to interpolate the value at a neighboring point.” In this way Tobler assures that each point affects its neighbors and is thereby affected by them. One way to introduce smoothness into a warping lattice like Tobler’s is to equate the warping with distortions of an elastic sheet or membrane (rubber sheeting). This is accomplished in a straightforward way by assuming that the energy of a deformation in an elastic medium is given by ∫ ∫2 (ω 1 xx σ xx + ωyy σ yy + ωxy σ xy )dxdy 48 where (ωxx , ωyy , ωxy ) is the strain tensor ωxx = ωyy = ∂u ∂x ∂u ∂y ωxy = 1  ∂u ∂v    + 2  ∂y ∂x    and (σ xx , σ yy , σ xy ) is the stress tensor  ε σ xx =  2 1 −σ ∂v   ∂u   ∂x + σ ∂y     48 Robert William Soutas-Little. Elasticity (Toronto:General Publishing, 1973( 90 37  ε σ yy =  2 1 −σ  ∂v ∂u   +     ∂y ∂x    ∂u ∂v  ε σ xy =   2(1 + σ )  ∂y + ∂x       and ε is young’s modulus and energy is proportional to σ is Poisson’s ratio. If σ = 0 than the  ∂u ∂v   ∂u ∂v  ∫∫ ∂x + ∂y  +  ∂y + ∂x  dxdy .         2 2 There are many variants of these first order partial differential equations that can be used49 and Tobler chose to minimize the equation  ∂u   ∂v   ∂u   ∂v  I =∫∫   +   +   +   dxdy .  ∂y   ∂y  ∂x   ∂x      D  2 2 2 2 The problem of minimizing this equation for a smooth lattice is a classical problem in the calculus of variations50 and finite difference methods51. We extend the integral over the entire plane lattice which is bounded by the contour Γ. Tang and Suen52 found a harmonic transformation that is algorithmically more efficient than Tobler’s but his approach is simple to program and is adequate for applications to historical cartography. These harmonic transformation models are extremely general and can transform any explanatory image into any other arbitrary target image through multidimensional warping. 49 D.J. Burr., “A Dynamic Model for Image Registration”. Computer Graphics and Image Processing 15 (1981): 102-112. 50 Boris N. Khoromsky and Gabriel Wittin. Numerical Solutions of Elliptical Differential Equations by Reduction to the Interface (Berlin: Springer-Verlag, 2000) 9-14. 51 I.M. Gelfand and S.V. Fomin, Calculus of Variations, trans by Richard Silvermann (Englewood Cliffs, N.J.:Prentice Hall, 1963) 22 ff. 52 Y.I. Tang and C.Y. Suen, “ Image Transformation to Nonlinear Shape Restoration”, IEEE Transactions on Systems, Man and Cybernetics, 23(1993): 155-171. 38 In using Tobler’s methods we are looking for a function u(x,y) and v(x,y)that is continuous on some region D, together with its first and second derivatives. Converting Tobler’s equations to be minimized into a variational problem and assuming functional symmetry we have  ∂u   ∂u  I =∫∫   +   + 2 f ( x, y )udxdy .  ∂y  ∂x    D  2 2 If we let u(x,y) minimize this integral and apply the calculus of variations we must then consider the value for the function u(x,y) + α ( x, y ) , where η( x, y ) is continuous on the lattice and equals zero on η the boundary contour Γ. We set the variation in the integral equal to zero d  δI =  I ( u + xη )  =0  dx  α =0 or δI = ∫ ∫2 D  ∂u ∂η  ∂u ∂η +2 + 2 fηdxdy . ∂y ∂y  ∂x ∂x  This can be expanded by means of Green’s formula53:  ∂  ∂u  ∂  ∂u   ∂2 u ∂2 u  ∂u ∂η ∂u ∂η  dxdy − ∫∫ 2 + 2  dxdy + dxdy = ∫∫ η η  + η ∫∫∂x ∂x ∂y ∂y   ∂x ∂x  ∂x  ∂y  ∂y  ∂y       D  D   ∂u  ∂u = ∫  ηdy − ηdx  − ∫∫∆uηdxdy  ∂x  ∂y  D Γ The first integral is equal to zero since the function η( x, y ) is equal to zero on the lattice contour. Taking the second integral and replacing I the first two terms in δ by it we obtain δI = −2 ∫ ∫( ∆u + f )ηdxdy . D 53 L.V. Kantorovich and V.I. Krylov. Approximate Methods of Higher Analysis, trans. Curtis Benster (New York: Interscience Publishers, 1958) 39 Since δI = 0 , whatever the function η may be must satisfy ∆u = ∂2 u ∂2 u + 2 = f ( x, y ). ∂x 2 ∂y . If f(x,y) =0, we obtain the solution, symmetric for u and v ∂2 u ∂2 u ∂2 v ∂2 v + 2 =0 = 2 + 2 , ∂x 2 ∂y ∂x ∂y that are a pair of Laplace equations. These are the conditions for the solution of Tobler’s least-squares problem. To apply these equations to our lattice we cannot solve them directly but instead use finite difference approximations. For the square lattice utilized by Tobler (triangular lattices can also be used) the finite difference approximation requires that the value of any point on our lattice be equal to the average of its neighbors u ij = (u i −ij + u i +1 j + u ij −1 + u ij +1 ) 4 . The average of these points is constantly changing in our minimization as new values are assigned to the nodes during the interpolation process. A stronger constraint on the effect of neighboring lattice points on each other is obtained by requiring that the derivatives of each point be the average of the their neighbors  ∂ ∂ u u ∂ u ∂ u ∂ u = + + + ∂ ∂ ij x x i −1 j ∂xi +1 j ∂ ij −1 x ∂ ij +1 x   .   To solve this equation for the value of u ij one substitutes the finite difference of each derivative and solves for u ij 40 ∂u 11 (u ij +1 − u ij −1 ) = ∂xij 2h Where h is the step size in the x direction of our lattice. This procedure yields the finite-difference approximation for the bi-harmonic equation 20 u ij = 8(u i +ij + u i −1 j + u ij −1 + u ij +1 ) − 2(u i +1 j +1 + u i −1 j +1 + u i +1 j −1 + u i −1 j −1 ) − (u i +2 j + u ij +2 + u i −2 j + u ij −2 ). This equation defines the target minimization at each point on the lattice based on specified boundary conditions54. Tobler’s models allow for the visualization of the regressions between the two maps by showing a smooth interpolation of the deformation. The last methodology for modeling the transformations and warpings between an early and modern map that we shall discuss is another physical analogue to surface deformation known as a thin-late spline55. A thin-plate spline uses a special function that is a solution to a bi-harmonic equation that expresses the energy of a thin-metal plate56. The special function is the surface z ( x, y ) = −U ( r ) = −r 2 log ( r 2 ) , where r is the distance x 2 + y 2 ( ) 1 2 from the origin of a set of Cartesian coordinates. The function is shown in Figure 6. 54 Chen, et. al., (2005) 4. 55 Fred Bookstein, “Principal Warps: Thin-Plate Splines and the Decomposition of Deformations”, IEEE Transactions on Pattern Analysis and Machine Intelligence 11 (1989): 567-585. 56 Christopher G. Small, The Statistical Theory of Shape (Berlin: Springer-Verlag, 1996) 110. 41 Figure 6: Thin-Plate Spline The function U(r) satisfies the equation  ∂2 ∂2 ∆2U =  2 + 2  ∂x ∂y   U = 0 .   For a thin plate subjected to only a slight bending, the bending energy at a point is proportional to  ∂2 z   ∂2 z   ∂2 z   2  + 2  + 2   ∂x   ∂x∂y   ∂y        2 2 at that point, and the function z(x,y) minimizes 2 2  ∂ 2 z 2  ∂2 z   ∂2 z     + 2  +  2  dxdy . ∫2∫ ∂x 2   ∂x∂y   ∂y         R  This quantity is called the binding energy of the spline function. 42 The algebraic solution to the spline problem provides for the analysis of local deformations and is complex so I shall only give a brief overview. Bookstein provides a more detailed presentation along with algorithms57. We begin by letting P1 = ( x1 , y1 ) , P2 = ( x 2 , y 2 ) ,... Pn = ( x n , y n ) , be the points of the explanatory image in an ordinary Euclidian plane according to a convenient orthogonal coordinate system. Take rij = Pi −Pj to be the distance between two points. Define the matrices U ( r12 )  0 U ( r ) 0 K =  21  ... ...  U ( rn1 ) U ( rn 2 ) 1 1 P= ...  1 x1 x2 ... xn ... U ( r1n )  ... U ( r2 n )  , ... ...   ... 0  y1  y2  , ...   yn  and K L= T P P O  where P T is the transpose of P and O is a 3 x 3 matrix of zeros. Let V = (V1 ,.... Vn ) be any vector, and write Y = V 000 ( T ), a column vector of length n + 3. Define the vector ω = (ω1 ,...ωn ) and coefficients a1 , a x ,... a y by the equation 1 L− Y = ω a1 a x a T . The elements of this define a function f(x, y) everywhere in the plane 57 Bookstein, (1989). 43 f ( x, y ) = a1 + a x + a y + ∑ ωiU ( Pi − ( x, y ) ) . i =1 n This function is composed of two parts, one containing the coefficients a is an affine transform representing the behavior of f at infinity, and the other, the summation of the U(r) function for all points. The function f then minimizes the equation for the binding energy  ∂ 2 f  I f = ∫ ∫ 2  ∂x    ∂2 f  + 2 2 2   ∂x ∂y   2   ∂2 f  + 2   ∂y   2     2  dxdy ,   and is called the integral quadratic variation. The value of this double integral is proportional to ωKω T = V ( L−1 KL−1 )V T n n 1 1 where L− is the upper left n x n sub block L− . In applying this to n historical cartography we take points ( xi , y i ) to be points on the explanatory image (modern map) and V then to be the n x n matrix  x′ V = 1 ′  y1 ′ x 2 ... x ′  n ′ ... y ′  y2 n where n is the number of points and the primed quantities are the same points located on the target image (early map). The application 1 of L− to the first column of V T specifies the coefficients of 1, x and y and the U’s for u(x, y) and the application of it to the second column specifies the v(x, y). The resulting vector valued function f(x, y) = [ u(x, y), v(x, y)] maps each point from the explanatory image to its tie 44 point on the target image and is the least warped or has the lowest bending energy of all possible functions. The vector-valued functions are known as thin-plate spline mappings. When the pairing of points on the two planes to be compared are tie points these functions model the local deformation between the two maps. The only major problem with the application of thin-plate splines is that it requires the inversion of a large matrix whose size is defined by the number of tie points. In large data sets this inversion can be computationally challenging. 45 III. Conclusion: Mathematics and the Historians Craft Geometry is a form of rhetoric58 In their highly influential Concepts in the History of Cartography: A Review and Perspective, M.N. Blakemore and J.B. Harley wrote, “a wider assessment of historical cartometry involves the diversity of motives for which such techniques have been developed; an examination of the adequacy of the current techniques in terms of their statistical and display devices; and an ongoing appraisal of the potential of such techniques to the broader objectives of the history of cartography59”. In this survey of the transformational mathematics that allow statistical comparisons and are useful in historical cartometry we have highlighted the strengths and difficulties associated with these methods and in doing so have found that there is no a priori way to decide which method works best in all cases. Although there is no universal technique that works best in all the situations, we can however, apply these methods to obtain meaningful and historically correlated results if we apply them critically. All of the relevant techniques are easily programmable into modern mathematics software such as Maple, Mathematica or MATLab, and the linear transformations can be performed in dedicated cartographic history software such as MAPANALYST. For the historian of cartography comparing the accuracy on a modern and early map has a number of formal and logical difficulties 58 Gunnar Olsen, Abysmal: A Critique of Cartographic Reason, (Chicago: University of Chicago Press, 2007), 99. 59 Blakemore and Harley, 1980. 46 that must be considered before the application of any cartometric process. 1. The identification of tie points on the two maps to be compared is sometimes difficult and once selected are seldom distributed evenly across the surface of the two images. The problem is essentially one of homology. Finding homologous points on two maps may seem trivial but differences in landmark or coastline shape, an incorrect assignment of place names and changing scales require insight into the target image. The selection of points is especially important when using the simplest linear models without M-estimators where results can be error distribution sensitive. 2. The accuracy and error to be tested and compared may not be a relevant concept across the whole surface of the map. Accuracy is seldom evenly distributed especially in the case of maps of small scale that may be composed from a variety of sources. Discontinuities in error from scale changes are not only scalar but are vector quantities whose direction is important to determine. Rapid changes in accuracy across map surfaces may be difficult to handle with simple linear models and are better approached using local non-linear radial basis functions whose correlations may be statistically more relevant, but mathematically more complex to program. Decisions regarding the use of local or global methods cannot usually be made ahead of time and require experimentation 47 in order to characterize the accuracy and decide on a methodology. 3. The substrate that the map is draw or printed on may have undergone distortion through shrinking, folding, or stretching. Distortion of this sort is especially important to consider in the case of environmentally sensitive materials such as vellum. The distortion of the medium not only effects the accuracy of statistical transformations that we are trying to perform but can also obscure the intent of the cartographer. 4. Spatial association does not necessarily imply causality. The warps and correlations that we calculate using cartometric techniques give real mathematical results but these may be extremely difficult to link to any historical meaning, event or cause. Care must be taken not to over interpret the results of these calculations and not to override historical and documentary context in the rush towards accuracy measures. 5. Historical cartometry is a problem in equifinality or process convergence. Similar types of distortion may arise from different causes making it difficult to derive exact causes for particular distortion patterns. 6. Experimentation is necessary and we must be prepared to use a variety of techniques to characterize the accuracy on a single map. Considering the qualifications in the application of cartometric methods to historical materials it is obvious that we cannot hope to 48 achieve exact “truth” using our methods or an absolute visualization and characterization of accuracy60. Instead we assign meaningful notions of probability and statistical measures that are useful for historical comparisons. The analysis of accuracy is an important part of the historical characterization of early maps but the assignment of the adjectives, “accurate” or “inaccurate”, needs to be made more precise in our discourse. The addition of statements like “to this level of confidence by this method” to our use of these adjectives would go a long way to making the results of our calculations repeatable by others engaged in the same research. This more analytic understanding of accuracy requires a much more careful linguistic and conceptual use of these ideas than has been the norm in the historical literature. Currently there are more algorithms being developed that will be useful for cartographic historians as the fields of medical imaging and shape analysis continue to provide fertile ground for the growth these mathematical techniques. There are many more ways to represent shapes on manifolds and to perform the types of similarity and coordinate transforms than we have discussed here and no doubt more applications shall be forthcoming. The areas of Stochastic geometry61 and Poisson processes in Euclidean space are especially interesting62. In the end the researcher who would employ these methods needs to critically consider each map to be studied on a case-by-case 60 Timothy R. Wallace and Charles van der Hanvel, “Truth and Accountability in Geographic and Historical Visualizations”, Cartographic Journal 42 (2005): 173-181. 61 A. Braddeley, “Stochastic Geometrry: an introduction and reading list” International Statistical Review 50 (1982): 179-193. 62 F. Morgan, Geometric Measure Theory: An Introduction (Boston: Academic Press, 1988). 49 basis and would be wise to consider the words of R.A. Skelton, which although written in a different context, can be used as model for the cautions we must recognize in historical cartometry. “The content of the map, as a whole, cannot be assigned confidently to a single phase or horizon of geographical knowledge. Its outlines are in part transcribed from a map prototype or prototypes not precisely identifiable with any extant work; in part they illustrate texts, not all of which have come down to us. The information taken by the author of the map from these sources (graphical and textual) relates to events and concepts of various periods; most of it older by a least a century, and some of it much more, than the presumed date at which the existing map… was made. The delineations in the map before us are separated by long intervals of time not only from the original experience that they reflect, but also from the direct records of it. For the mapmaker was working always at one remove, sometimes (we cannot doubt) at two or more removes, from firsthand records; and it is evident that, to a degree and in senses which it is difficult for us to divine, he exercised his judgment in selection from and in adaptation of his sources, which are themselves partly unknown to us.63” 63 R.A. Skelton et. al., The Vinland Map and the Tartar Relation (New Haven, CT: Yale University Press, 1965) :228. 50 Bibliography Husserl Edmund, The Crisis of European Sciences and Transcendental Phenomenology: An Introduction to Phenomenological Philosophy, trans. David Carr (Evanston, Ill.: Northwestern University Press, 1970) 355. Plimpton, Ramsey Frank. Notes on Philosophy, Probability and Mathematics (Naples: Bibliopolis, 1991) 56. Harley J.B. and M.J. Blakemore, “Concepts in the History of Cartography: A Review and Perspective,” Cartographica 17 (1980): 54 Tissot, Nicolas August Memoire sur la representation les surfaces et les projections des cartes géographiques. (Paris: 1881). Hessler, John “Warping Waldseemuller: A Phenomenolgical and Geometric Study of the 1507 World Map,” Cartographica (forthcoming June 2006) Bangbo Hu, “Assessing the Accuracy of the Map of the Prefectural Capital of 1261 Using Geographic Information Systems,” Professional Geographer 53 (2001): 32-44 G. Fostner and M. Oehrli, “Graphische Darstellungen der Untersuchungs ergebnisse alter Karten und Entwicklung der Verrungsgitter.” Cartographic Helvetica 17 (1998): 35-43 K. Bretterbauer, “Zur Konstruktion von Verzerrungslinien mittels der multiquadratischen Methode von Hardy.” Kartograpische Nachricten 2 (2005): 83-85 T. Fuse and E. Shimizu, “A Study of the Geometric Correction of Historical Maps,” International Archives of Photogrammetry and Remote Sensing 32 (1998) : 545-548. A.M. MacEachern How Maps Work: Representation, Visualization and Design (New York: Guilford Press, 1995) N.R. Chrisman, “Speaking Truth to Power: An Agenda for Change” in K. Lowell (ed.) Spatial Accuracy Assessment: Land Information Uncertainty in Natural Resources. 27-21 J.B. Harley, Ordnance Survey Maps: a descriptive manual (Southampton: Ordnance Survey, 1975) J.B. Harley, “The Evaluation of Early Maps: towards a methodology” Imago Mundi 22 (1968): 62-74. David Harvey, Explanation in Geography (New York: St. Martin’s Press, 1969) Maurizio Pilu, “Undoing Paper Curl Distortion Using Applicable Surfaces”, International Conference on Computer Vision (ICCV’01), Vancover, July 2001. M.P. Do Carmo, Differential Geometry of Curves and Surfaces (New York: Prentice Hall, 1976) 51 German Diaz and Patricia Steed, “Electronic Restoration: Eliminating the Ravages of Time on Historical Maps” ICADL 2005 (Berlin: Springer Verlag, @005) P.J. Huber, “Robust Estimation of a Location Parameter,” Annals of Mathematical Statistics 35 (1964): 73-101 and Robust Statistics (New York: John Wiley, 1981) F. Hampel, Contributions to the Theory of Robust Estimation, PhD Thesis (Berkeley: University of California, 1968). A. Tuller, An Introduction to Geometries (Princeton: Princeton University Press, 1966) 70. Waldo Tobler’s papers on the development, theory and applications of bi-dimensional regression include, “Computation of the Correspondence of Geographical Patterns”, Papers of the Regional Science Association 15 (1965): 131-39; “Medieval Distortions: The Projections of Ancient Maps”, Annals of Association of American Geographers 56 (1966): 351-61; “Bi-dimensional Regression”, reprinted in Geographical Analysis 26 (1994): 187-212. Tomoki Nakaya, “Statistical Inferences in Bi-dimensional Regression Models”, Geographical Analysis 29 (1997); 169-185. H. Akima, “On Estimating Partial Derivatives for Bivariate Interpolation of Scattered Data”, ACM Transactions on Mathematical Software 9 (1984): 41-52 Bertrand Russell, On Scientific Method in Philosophy (1914) F.L. Bookstein, Measurement of Biological Shape and Shape Change (Berlin: SpringerVerlag, 1978) ch. 6. V. Barnett and T. Lewis. Outliers in Statistical Data (New York: John Wiley, 1994). Peter Rousseeuw and Annick Leroy, Robust Regression and Outlier Detection (New York: John Wiley, 1987) W. Tobler, “A Computer Movie Simulating the Urban Growth in the Detroit Region,” Economic Geography 46 ( )234-40. F. Hampel, “The Influence Curve and its Role in Robust Estimation,” Journal of the American Statistical Association 69 (1974): 383-393. Hans Reichenbach, Theory of Relativity and A Priori Knowledge (Los Angeles: University of California Press, 1965) 88-89. P.H. Laskowski, “Map Distortions and Single Value Decomposition”. Technical Papers, ASPRS-ACSM Annual Convention 4 (1987)42-51. George Wolberg. Digital Image Warping. (Los Alamitos: IEEE Computer Society Press, 1990), 52 Sung Joog Ahn. Least Squares Orthogonal Fitting of Curves and Surfaces in Space. (New York: Springer-Verlag, 2004). C. Lawson and R. Hanson. Solving Least Squares Problems. (Englewood Cliffs: Prentice Hall, 1974) I.D. Barrodale, D. Skea, et.al. “Warping Digital Images Using Thin Plate Splines and Control Point Matching,” Pattern Recognition 26 (1993): 375-376. R.L. Hardy, “Theory and Applications of the Multiquadric-biharmonic Method” Computers and Mathematical Computing 19 (1990): 163-203 Vladimir Liseikin, Grid Generation Methods (New York: Springer-Verlag, 1999) Guo Chen, Zhilin Li and Ping Liu, “A Fast Finite Difference Method for Bi-harmonic Equations on Irregular Domains”, Center for Research in Scientific Computing Report 04-09, North Carolina State University, 2005. Robert William Soutas-Little. Elasticity (Toronto:General Publishing, 1973( 90 D.J. Burr., “A Dynamic Model for Image Registration”. Computer Graphics and Image Processing 15 (1981): 102-112. Boris N. Khoromsky and Gabriel Wittin. Numerical Solutions of Elliptical Differential Equations by Reduction to the Interface (Berlin: Springer-Verlag, 2000) 9-14. I.M. Gelfand and S.V. Fomin, Calculus of Variations, trans by Richard Silvermann (Englewood Cliffs, N.J.:Prentice Hall, 1963) 22 ff. Y.I. Tang and C.Y. Suen, “ Image Transformation to Nonlinear Shape Restoration”, IEEE Transactions on Systems, Man and Cybernetics, 23(1993): 155-171. L.V. Kantorovich and V.I. Krylov. Approximate Methods of Higher Analysis, trans. Curtis Benster (New York: Interscience Publishers, 1958) Fred Bookstein, “Principal Warps: Thin-Plate Splines and the Decomposition of Deformations”, IEEE Transactions on Pattern Analysis and Machine Intelligence 11 (1989): 567-585. Christopher G. Small, The Statistical Theory of Shape (Berlin: Springer-Verlag, 1996) 110. Timothy R. Wallace and Charles van der Hanvel, “Truth and Accountability in Geographic and Historical Visualizations”, Cartographic Journal 42 (2005): 173-181. A. Braddeley, “Stochastic Geometrry: an introduction and reading list” International Statistical Review 50 (1982): 179-193. F. Morgan, Geometric Measure Theory: An Introduction (Boston: Academic Press, 1988). R.A. Skelton et. al., The Vinland Map and the Tartar Relation (New Haven, CT: Yale University Press, 1965) :228. 53 About the Author Formerly of the Smithsonian Institution and the National Archives, John Hessler is a Senior Cartographic Librarian and Curator in the Geography and Map Division of the Library of Congress. He has written extensively on the history and the mathematics of classical and medieval cartography, and has published articles and reviews in many journals including Applied Physics Letters, Complex Analysis and Geometry, Imago Mundi, Cartographica, and Coordinates and is the author of a new commentary and translation of Martin Waldseemüller’s seminal text the Cosmographaie Introductio entitled The Naming of America: Martin Waldseemüller’s 1507 World Map and the Cosmographiae Introductio (January, 2008). His research has focused on the use of computer modeling, especially radial basis functions and thin-plate splines, in the analysis of Roman, Medieval and Renaissance cartography. A Fellow of the Royal Geographical Society, he is the recent recipient of a J.L. Heiberg Research and Exploration Fellowship and is researching and mapping the archaeological remains of Roman Surveying and Centuriation in Southern France and Northern Tunisia, primarily interested in the epigraphic and limites remains in the areas outside of Carthage and El Jem. Currently editing the first complete Latin edition of the Corpus Agrimensorum, a compendium of Roman Surveying manuals that dates from the 6th century, he also runs the historical map analysis website www.warpinghistory.blogspot.com . 54
x

Log In

or reset password

Reset Password

Enter the email address you signed up with, and we'll send a reset password email to that address

Academia © 2012