The Vector as a Datastructure - an article written to test the use of procedurally generated figures in articles
Scientific books, as well as simple papers, rely simultaneously on text, figures and various types of symbolic formulas to communicate the ideas as precisely and cleary as possilble: but drawing nice figures often takes a lot of time, whether drawn by hand and than scanned, or done with a graphics program. - it is a lot of tedious work either way.
Since this overwhelming labour can easily kill the creativity, in order to avoid ending up cutting down the number and quality of figures in scientific papers, I worked out a new technology for this: writing a simple procedure in PHP to draw a figure with SVG graphics.
HTML provides at least some support for superscripts and subscripts, variable-size letters, etc. : so that's something... if one wants to, a formula can be made to display nicely.
f(x) = k0
But for figures (where I mean mostly schematic vector-graphics figures) there is no readily available solution. But PHP, togather with SVG, allowed me to device a method to habdle figures too.
Schematic figures are as important as forumlas in scientific papers, so here it is!
( fig.2: A generic vector: a row of N generic containers. )
Refresh the page to get a different vector!
A universally compatible method to handle the schematic figures, ready to visualize on all internet browsers. From which a paper can easily be then exported to a PDF document, with all figures preserved as real vector-graphics images.
And it does the job in a way that it is compatible with ALL internet-service / web-hosting services around the world. And ALL browsers support it ; and even in the remote case in which SVG is does not appear to be supported, that support can be easily installed.
Doing figures for scientific articles this way, is a tecnology similarly to the way Algorithms is a technology: a friend and I have been troubling a lot with figures, and I thought hard to find out a method to handle this real-life problem of creating figures for scientific papers in a way that all internet browsers can visualize them correctly.
So I found this: for simple figures, a little PHP script can be written out of scratch, while for example, to support more complicated figures I will try to make available soon a little toolkit of functions in PHP to handle 3D figures in an easy way. As an example, on the page about terrain-rendering there is an example of a quite nesty 3D figure.
The whole method can be of course easily taylored to specific needs: mechanical schemes, structural chemical formulas, spatial chemical formulas, schemes of electronic circuits, PCB layaouts visualized in 3D, etc... .
Well, have lot of fun creating figures now finaly without overwhelming tedious labour!
What follows, is a sort of fictitious scientific paper, but featuring figures generated procedurally with simple PHP scripts. Where this holds, of course, for 2D and 3D figures alike.
Yes, that's all about it: HTML + PHP + SVG (that's for drawing lines, squares, triangles, etc) !
This is a fictitious article in the way that it does not present any original research, but it explains some basic and well-consolidated things. But with the seme style as it would be the case for an article presenting some cutting-edge research topic. Featuring procedurallly generated vector-graphics figures! So, let's present:
The Vector as a Datastructure
1.The classical vector: a row of numeric values
A vector is a datastructure.
A datastructure is an abstract container of a certain types of information: and of course it's restricted to types of information which are univocously defined... anything that can be translated to quantities and logical relations.
A vector can be viewed according to its classical definition of a row of 2, 3 or more numbers which represent typically some continuous measures, where the order in which these values are taken, is strictly fixed (!).
So, an essentially static datastructure.
Let'see this for first.
( fig.0: representation of a vector according to it's classical definition.)
[ 42.11 | 23.55 | -83.59 | 38.73 | ]
Yes, a figure can also be made of text! Refresh the page to get a vector of different size and random content! Here, SVG was not even needed! Simple but great, isn't it?
Since in the classical scenario a vector represents the position of a point in 2D or 3D space, maybe a vector containing more numbers is a bit impressive to some, due to the prejudices of the past.
If thinking to the position of a point in an N-dimensional abstract space confuses You, then consider the other possible application of a vector holding N values: a vector of generalized coordinates representing the configuration of a system (whatever kind of system: my favourite ones are dynamical systems, for example) which has N degrees of freedom. It still holds true, that the order of the values is crucial!
( fig.1: A classic example of dynamical system that can be simulated with Lagrangian Dynamics: I chose to parametrize its configuration with the angle of each of the 2 wheels. It's actually a bit tricky to get it right because one has to consider how much the internal wheel has managed to roll according to it angle (angle 2), on the wall of the outer wheel. Refresh the page to get different configurations with wheels of different radiuses!)
The vector of generalized coordinates (the 2 angles in this parametrization, expressed in radians) is: [ 2.3 , 1 ]
In this case, the very configuration of the system is like a point in a space which has as many dimensions as many degrees of freedom the system has.
Let's take a system having N degrees of freedom. In this case, the use of a vector of Generalized Coordinates to define the configuration of the system, is a very concrete and practical concept. While thinking of a particular configuration of the seme system as a point having a particular position in an abstract N-dimensional cartesian space, is very abstract. But yes, the 2 concepts are so strongly related: in fact in both cases, we store the system's configuration in a vector.
Generalized coordinates are used mainly in Lagrangian Dynamics, whose invention was published in 1750. So, generalized coordinates, however many there are of them, can be used confortably even without knowing that actually a vector of generalized coordinates is like a vector which expresses the position of a point in an N-dimensional space. Where the point's position is our system's configuration. The abstract N-dimensinal space was invented later, in fact. But we can get some interesting insights to the topic, and we can corroborate the concept of a vector of generalized coordinates. For example if we say that well, some generalized coordinates are allowed to take only values between 2 extremes, well, we can see our vector of generalized coordinates as expressiong a point's position in an N-dimensional parallelepiped (a box)! And the point's position is, the system's configuration. - that's it!
There are some funny applets across the interenet, which allow to visualize various projections of such multidimensional 'boxes', so You can play around with them, to see that these, after all, are not complicated concepts: they aren't either very useful, but it's important to know that the concept of abstract multidimensional spaces comes out in many areas, unintendedly... then one can accept this and laught at how funny it is, or just ignore it.
Now that we have been generalizing quite some bit, to conclude the discussion of the classical vector, let's say that we can also consider a vector of vectors. To obtain an MxN matrix datastructure which is well-known in these modern times in which this paper was written. The far most useful one is the simple 2D MxN matrix, even if the concept can be generalized to multidimensional matrixes.
This is a topic more relevant to Numerical Methods, so in this paper only the bare minimum basement is sketched out. You can even jump the next paragraph if You like.
A Vector can be considered an Mx1 matrix, thus being a monodimensional matrix, while N vectors each of size M, form a 2D matrix: as simple as that. These 2 datastructures suffice to get along with most things which can be treated scientifically: this because even in the worst case, at most we have a (linear) system of linear, second-order differential equations (with constant or non-constant coefficients which are anyway treated as constants in any Numerical Method ). And since a linear system can comfortably be stored in a simple 2D matrix, it's all fine. My favourite system of differential equations, for example, is the one given by the Lagrangian Equations of Motion.
I wanted to mention first the worst case, but let's mention also a simpler case:
any case in which a simple linear system of algebraic equations is sufficient. In some cases one needs to calculate the solution, while in some other cases it is instead just a smart way to express, formalize, and store a linear transformation. There are plenty of examples across the internet, so let's move to the next topic more relevant for the field of Algorithms. Another example is, one of my personal favourites, the linear system used to find the coefficients in the polynomial interpolation: in my short autobiography, "A Nerd's Life", I tell the story I had with polynomial interpolation.
I mentioned it, because in fact a vector can also be used to store a polynomial function.
( fig.2: A polynomial function and how it is stored in a vector holding N rational numbers )
where the vector "containing" it, is this: [ k0 | k1 | k2 | k3 | ... ]
with some concrete values here it is, and below the graph of the polynomial between x=-10.0 and x=10.0 and y in a rande [-100, 100]:
1.776 | 36.845 | 27.695 | -24.305 | -12.368 | 15.086666666667 |]
Refresh the page to see the seme thing with a different value-set!
NOTE: I tried to tune the application to generate "nice" graphs, but multiple attempts may be necessary before getting a nice colorful polynomial.
And a polynomial function in its turn is a sort of datastructure which is able to hold approximations of more excotic functions (such as the exponenetial, cosinus, sinus, logarithm, etc.: mostly all so-called transcendental functions: because in fact they can not be really "contained" (contained in any datastructure according to the topic here discussed ; since another topic is to grasp it with human imagination ) as a polynomical can, just as an irrational number can not be expressed with a finite number of significant digits available ). An approximation which is reliable at least in an [A,B] interval. It's a container able to contain a huge class of functions: this it also defines a large class of "containable" functions. However, it's not the only one: another one is the stuff behing mp3 audio files and related topics, and still another one is the stuff used to draw nice curced lines in many graphics and CAD programs, those things typically calles SPLINE curves and Bézier curves.
You don't like mathematics? - No problem! Have you ever read on a math book that the generic polynomical functionis a sort of datastructure? - certainly not. Not because it's not true, but because stating such a thing this way is more typical of nerds. That's why this paper and the site on which this has been hosted the first time, is not adressed mainly to mathematicians, but to nerds.
2. The generic vector: a row of generic containers for heterogeneous data
But a vector can be also viewed in a wider perspective. In a wider perspective, a vector is a row of containers: and since in the containers we can have indexes, values, and whatnot, the order of the values is not so crucial because we can have indexes which make each container uniquely identifiable. The figure below illustrates the concept.
( fig.3: A generic vector: a row of N generic containers. )
Refresh the page to get a different vector!
This is an abstract datastructure, which we can be sorted according this or that information held in each container ; we can add containers and delete containers. So, such extrapolated concept of vector is a dynamic datastructure, contrarily to the classical vector! As such, it is capable of containing an abstract set: more general than this would be hard to pretend!
Since the order is not important, we can set any rule according which the set's elements shall be ordered, in fact. As much that we can also say that the elements of a given set are not ordered - which means that we can order according to taste.
This is fine. But is the vector as a datastructure really so useful to deserve all this attention? Yes, because, for example, it can act as a container for more complicated datasructures: various tree-structures and the generic graph too.
Representing and storing a tree-structure is one of the biggest attractives, because trees can store hierarchies and subdivisions (yes, because subdivisions exhibit a hierarchic structure ): this requires a little bit of creativity.
Subdivisions don't often get the attention they deserve, but they are important in many areas. On the paper which introduces to terrain-rendering, this is treated more in depth.
A subdivision, essentially is a sort of binary-tree (subdivision of an 1D segment) or a sort of quad-tree (subdiviosion of a 2D rectangular area), or sort of oct-tree (subdivision of a 3D rectangular area), etc ; but a tree of which only the leaf-nodes are considered. The tree itself is not even needed, but it's the primordial prototype of a datastructure which can store a subdivision for us.
Look at the figure!
( fig.4: left: simple graphical representation of a subdivision ; right: generic representation of a quadtree displayed so that its nodes coincide with the centers of the sudivision's blocks ' quad-tree )
Refresh the page to get a different one!
Each node of the tree-structure can be assigned an univocous sort of ID number. For the quadtree, start from 1 node: that has ID number 0 ; and each child-node of it has an ID number of id = 4*parent_id + child_n. And so on. Let's call these ID numbers QIDs : as a short for Quadtree-Node ID-Number... or call it as You prefer. Well, The leaf-nodes (those nodes which don't have children) of the tree-structure define a subdivision.
In fact, and here is the point, a subdivision can be literally stored in a simple vector of integer numbers! - one in which , anyway the order of the numbers is irrelevant: irrelevant becaususe the ID numbers of a tree-stucture of this kind are absolutely univocous, and so are the ID numbers of it's leaf-nodes.
And yes, that's it: a subdivision is par-excellence something which, thus, can be stored in the generic vector datastructure - where we put, in this case, an integer number in each container casel.
fig.5: subdivision of a 2D rectangular area, stored in a generic vector of integer numbers (the order is irrelevant). [ 1 9 10 11 12 3 4 ]
NOTE: the quadrants' numbering is arbitrary. I usually use this convention:
And yes, although a subdivision is excellently geometric, it can be stored in such an abstract datastructure. One in which no geometric quantities occur, and so not even units of measures are used. And we know that representations which do not need units of measures are usually better: because so, units of measures can be introduced comfortably and safely during usage. Thus favouring universality as to theoretical discussions, and compatibility when it comes to computer-software.
Even if from a practical point of view, it can also be useful to store the wole sequence of subdivisions done to arrive at a given final subdivision. As a footnote, it this case, at each step, both the current subdivisions't QID vector, both the vector hlding the QID numbers of the block to subdivide, should be stored. This way the subdivision is easier to reconstitute from a vector. Or, as the Subdivision is stored in a generic vector, it's advisable to store also some x,y values of, say, the lower-left corner of each block and its sidelenght could be stored: so, after, it's always easy to visualize it graphically.
I said a little before that subdivision usually don't get the attention they deserve, even if they have lots of applications and are necessary to formalize lots of real-life things. In fact, let's come to an example, discussed below.
Most of the research I had to do in order to establish my new terrain- or surface- rendering Algorithm consisted in devicing a synamic datastructure which holds a subdivision without having to store an entire tree. On the paper introducing to terrain-rendering, You can read more about this.
Normal trees instead, can be stored as a vector of elements where each element contains some information and 1+n pointers: pointer to the parent-node (mandatory), and n pointers to its children (optional). This is a bit a sort of cheating, since a vector would not be actually needed... so let's leave it at specialized material. Let's go on.
The other attractive is how to represent and store graphs: this is only marginally mentioned here: this requires even more creativity.
( fig.4: For example, let's take the adiacence-list representing the network of links in this site. A graph. ) database my_nerdofalgorithms.adiacence_list: PAGE 0 LINKS TO PAGES: 1 | 2 | 3 | 5 | 6 | 9 | 10 | 11 | 12 | 13 | 14 | PAGE 1 LINKS TO PAGES: 0 | 9 | PAGE 2 LINKS TO PAGES: 0 | 4 | PAGE 3 LINKS TO PAGES: 0 | 9 | PAGE 4 LINKS TO PAGES: 0 | 2 | PAGE 5 LINKS TO PAGES: 0 | 6 | 3 | 1 | 12 | PAGE 6 LINKS TO PAGES: 0 | 7 | PAGE 7 LINKS TO PAGES: 0 | 6 | PAGE 8 LINKS TO PAGES: 0 | 6 | PAGE 9 LINKS TO PAGES: 0 | 1 | 3 | PAGE 10 LINKS TO PAGES: 0 | 11 | 12 | PAGE 11 LINKS TO PAGES: 0 | 5 | 10 | 12 | PAGE 12 LINKS TO PAGES: 0 | PAGE 13 LINKS TO PAGES: 0 | PAGE 14 LINKS TO PAGES: 0 |
This requires to accept some little cheating since we can jump to any node of the graph just going to various elements of the vector, but actually this is not a big issue, since we always need anyway an adiacency-list or a connection-matrix to define the very graph itself. A true graph can be generated thereafter by creating a bunch of independent containers, and then inserting into each the pointers to all the nodes which can be accessed from there: not very practical since we still use a vector to store the containers with their adresses, but we have then a real graph as soon as we delete the vector holding sequentially the adresses of each independent container used as a node of a graph. So that a nodes can then be accessed only from another node... fact because of which unconnected nodes or portions of the graph may be permanently lost, keeping on occupying memory for nothing. So the vector wins anyway. But why?
Simply because the graph as a datastructure is instrinsecally tricky: because it stores relations, and not pieces of easily tangible content! Something to which it's also in place to add that even if we said that a relation is tangible because it's a pointer (an ID-number or memory adress or whatever), relations alone don't make up any big stuff... they are meaningful only if we take the whole set of relations.
And that's more complex because relations are not so much tangible, moreover relations -let's say - connect anyway nodes which are instead sorts of generic containers which may in fact hold pieces of content if we wanted them to. So while vecors are just a bunch of items put in a row of containers, graphs are a sort of mix of items and their relations... and in fact there are different ways to represent this. While instead there is only one way to represent a bunch of items: with a generic vector filld with the items.
So, the ultimate difference is that vectors hold pieces of content (example: ID numbers and describtions, color-codes, etc, of apples in a chest), while graphs can hold either actual structures ( items and their relations to each other ) or just networks of relations (such as hierarchies and networks of references) ; all more abstract and also deeper, more complex stuff).
So the generic vector as a datastructure is good because it's simple and it can allow to represent other, more complex datastructures in a more-less easy way: of course cheating is allowed.
Similar datastructures are lists, stacks, and queues: but they are really simple with respect to generic graphs, so the discussion ends here.
---END OF PAPER---
date: 06 April 2015
by Simon Hasur
paper first hosted on the website The Nerd of Algorithms: