© 2000, Regional Research Institute, WVU. No portion of this Web site can be reproduced on paper or electronically
without express permission from the Regional Research Institute.


Keystone Sector Identification
A Graph Theory-Social Network Analysis Approach

by Maureen Kilkenny and Laura Nalbarte

The research on "Keystone Sector Identification" was financed by the Tennessee Valley Authority (TVA) Rural Studies Program, 1997-8. The TVA Rural Studies Program's mission areas include finding new ways to describe and analyze rural economies, enhancing entrepreneurial/business activity, and supporting innovative community change. TVA funds supported author Nalbarte's work in partial fulfillment of the requirements for her Masters Degree in Economics at Iowa State University, granted December 1998.

Table of Contents
INTRODUCTION
Prevailing Methods
    I-O Based Methods
    Revealed Comparative Advantage
Results of Targeting Critical or Comparative Advantage Sectors
Keystone Species Concept
Preliminary Analyses
GRAPH THEORY and NETWORK ANALYSIS
Digraphs
NETWORK ANALYSIS of a COMMUNITY
Towertown Survey
Macro/Community wide structure
    Density
    Components
Micro/individual roles
    Local Centrality and Prestige
    Global centrality
    Peripheral Entities
    Entities Critical to Network Structure
BLOCK MODELING
    Blocking
    Tie Assignment
THE KEYSTONE TEST
    Fracture Test
    Efficient Path Test
SUMMARY
A Replicable Keystone Sector ID Method
Strengths and Weaknesses
Results and Implications
BIBLIOGRAPHY
APPENDIX I: Entities in "Towertown"
APPENDIX II: CONCOR results
APPENDIX III: BLOCKMODEL SECTORS


INTRODUCTION

This is about a new a method for identifying keystone sectors in communities, where sectors are broadly defined to include churches, clubs, associations, and public institutions as well as firms and businesses. In an arch, the keystone is the one with the unique wedge shape at the top of the arch that is critical for the arch’s structural stability. While all other stones in an arch substitute for one another and can be removed (in pairs), the arch will fall apart if the keystone is lacking. The term keystone species was first coined by ecologists in the late 1960s with respect to the species uniquely responsible for the structure and integrity of an ecosystem. We now coin the term for use in community development analysis. In a community, the keystone sector is one that plays a unique role, without which the community is fundamentally and detrimentally altered.

The method presented here allows a researcher to do three main things: (1) quantify a community's structure, (2) describe the roles of all types of entities in that community, and (3) identify the critical, or keystone, entities in the community. This new method can highlight the importance of entities that would not even be considered, much less identified as critical, using the prevailing 'critical sector' or traditional 'industrial targeting' approaches familiar to economists.

Prevailing Methods

I-O Based Methods

There is a long tradition among applied economists seeking useful answers to the question: "What should be given priority for economic development funds or special credit?" Target Industry Analysis, developed by TVA, is the prevailing method used to respond to this question. Target industry analysis (a.k.a. critical sector or turnkey industry analysis) attempts to match industries with locations for the mutual benefit of the firms and the communities. The method keys off the identification of relatively dominant industries , those sectors with relatively high location quotients , that also have a number of input-output linkages with other local industries with high location quotients. The inference is that industries that are already in the area are the best indicators of industries that should be recruited.

Two approaches are typically used to estimate the size of the potential market for the recruited industry. An industrial 'linkages' approach estimates local demand as the national input-output industry´industry intermediate use coefficients times local industry activity. A trade area approach estimates final demand, taking distance, population density, and local income into account using a gravity formula.

Consider the popular linkages approach. What economists have been calling critical or key sectors are those sectors whose structure of backward and forward linkages create above-average impacts on the rest of the economy (Sonis et al., 1998). Given a record of a community's industrial input-output transactions, the strength of the effects of a purchasing sector on the sectors it purchases from, or backward linkages, are given by the column coefficients for the sector in the multiplier matrix . The forward linkages, or the effects of a sector on the other sectors it sells to, are given by the row coefficients of the multiplier matrix. These would be the sectors whose column and/or row coefficients are higher than average.

"Any unit change in final demand of a particular sector will affect demand and supply of intermediate inputs. The demand stimulates other domestic sectors to satisfy its intermediate requirements ( backward linkage ). The supply also stimulates domestic production because it may induce use of its output as an input in new activities ( forward linkage )... Interdependencies among productive sectors indicate each sector’s potential capacity to stimulate other sectors. Activities having the highest linkages are considered key sectors because by concentrating resources in them it should be possible to stimulate a more rapid growth of production, income and employment than with alternative allocations of resources." (Cella, 1984)

The strengths of input-output based methods for identifying important sectors are that (i) the biproportional sector-sector transactions table is a popular and useful way to document sectoral interdependencies; (ii) the definition of the system boundary is very operational (i.e., Y = AY + X => Y = mX where m = [I-A]-1 ); (iii) the I-O methods have proven useful in many other contexts.

The weaknesses of I-O based methods include (i) I-O data is prohibitively costly to collect; thus (ii) U.S. national tables are released only every five years, with a four-year lag (e.g., 1997 data will be available by 2001); and (iii) there is no comprehensive sub-national data; (iv) the I-O approach is limited in scope: only current account (money-for-things) transactions are considered, and (v) only between business entities. There is no consideration of other types of transactions or relations, such as capital account transactions (money-for-paper), charity, social relations, public goods and services, information flows, or institutional support. There is no consideration of the roles of any non-business entities in a community.

Revealed Comparative Advantage

Another way used to determine which industries to target for an area is to survey existing businesses in attempts to identify the regional comparative advantage. In general, a region's comparative advantage activities are those which employ their relatively abundant factors of production most intensively (Hechscher-Ohlin-Samuelson); or those industries in which local producers are more productive (fewer inputs per unit output) than other region producers (Ricardian). The regions with the resources or technology to produce certain items relatively more cheaply should be able to profit most by specializing in those items. Market frictions, however, such as labor or capital immobilities, or, high fixed or transport costs, can undermine these hypothetical gains from specialization and trade in comparative advantage goods.

To identify the comparative advantage sector requires measuring and comparing relative factor abundancies, or relative sectoral factor productivities, in the region to those of the trading partners. Alternatively, some analysts assume that a region's comparative advantage sectors are revealed by their existing production and trade patterns. The deduction is that if a region is specializing in and exporting something in a competitive market economy, it must be its comparative advantage sector.

This assumption means that rather than attempting to assay local factor endowments or productivity objectively, a direct survey approach is often used to identify a region's comparative advantage sector. Businesses are asked about the location's proximity to input suppliers and/or output consumers, labor force costs, energy costs, and the industrial climate established by the state/local government, among other things. Their answers are used to profile a region. The validity of the approach relies on the objectivity of the survey respondents and their abilities to compare their chosen locale's feature to the features of other locations. Their answers to the questions are then interpreted in comparison with other, previous, studies. Two types of conclusions can be drawn. One is that more of the existing comparative advantage activity will be better. The other is that that industries which have located in areas with comparable profiles may be interested in locating in the study area as well (Goode and Hastings, 1988).

Results of Targeting Critical or Comparative Advantage Sectors

The lists of businesses in industries that may be recruited to the region, identified by either of the methods described above, are then screened for desirability for the community. The criteria include wage rates or value-added per employee, demand for the unemployed portions of the local labor force, ability to serve regional excess demands, expected growth potential, and other location specificities. The highest ranked industry is then targeted for recruitment. The process is considered successful if the business comes. The degree of success is typically measured by the number of workers initially employed locally by the recruited firm.

The results of even the "successful" industrial targeting programs, however, have been criticized for merely crowding-out private activity, or, being an imbalanced use of funds (Jacobs, 1985). The 'crowding out' argument is that if no market failure has been identified, why should public support be needed by a key or comparative advantage sector that is valued and profitable in a location ? If there are no market failures, public support does only what the market would have done anyway (crowding out), and, at a higher cost. The 'imbalanced use of funds' argument is that the approaches used to identify the target industries probably lead to too much capital investment in that one type of industry. The particular over-investment outcome depends on the sector identification method employed. 

The "critical sector" and "revealed" approaches are biased towards the outcome that all locations have the same mix of industries, especially those industries with high linkages according to nationwide I-O data. In that case, there may be too much investment in such sectors economy wide. The "comparative advantage" approach is biased in the opposite direction, away from industries common in other regions, and towards more unique industries already thriving in the region of interest. In this case the targeting of public support may lead to overinvestment in the region-specific industry, with an immiserizing growth outcome. Neither key sector ID approach attempts to ascertain whether or not the local industry has already achieved optimal scale in the location or in the relevant market. Neither approach attempts to ascertain why further expansion in an industry which started in response to private incentives is not possible without public subsidy.

In addition to unrealized income growth, the other outward signs of too much investment in one type of activity include (i) environmental degradation, (ii) stagnation or decline in other sectoral activity, (iii) leakages of capital-related income, and (iv) population migration out of the area (Jacobs, 1985).

Given that the inspiration for this new way to identify key sectors comes from ecological sciences, an appropriate analogy is that industries induced to locate in an area by subsidies and tax holidays are "predators." In addition to using up tax revenues, the industries may 'feed' on the earning capacity of privately initiated local firms in the same industry. Subsidized industrial expansion can eclipse the existing industries in markets for local labor, similar inputs, or similar consumers. Excessive expansion of comparative advantage sectors may destroy natural resources. The targeting of a limited type of industry reduces the diversity of the local economy. A less diverse economy is more sensitive to exogenous shocks originating in those sectors. Finally, especially if it is the major employer, an industry designated as "critical" can more effectively threaten to abandon the community if substrate supplies, subsidies, or other attractions diminish (Wolhgemuth and Kilkenny, 1998). If such industries succeed in obtaining further public subsidy, the "predator" may be become a "parasite."

Keystone Species Concept

The dominant industry and predator or parasite analogies introduced above are the natural point of departure for further elaboration of useful concepts from ecology. As noted in the introduction, this project develops the keystone species concept coined by natural scientists for application by social scientists in the analysis of economic systems. We adopt the ecologists' definition of the keystone as being unique and without which the system structure would be fundamentally altered. We develop our own approaches for describing the roles of entities in a system, and our own test of 'keystoneness,' using techniques of graph theory from mathematics and social network analysis from sociology. The new methods are demonstrated in an application to data about a single community.

The concept keystone species was first coined by ecologist Robert Paine in the late 1960s, in his identification of a predator as the critical species in an ecosystem. Paine described the effect of removing a species of starfish, the main predator on mussels, from a rocky intertidal zone. Without the starfish, the whole system changed. The mussels proliferated, leading to a significant change (increase) in biodiversity. Paine called the starfish a keystone species:

"The species composition and physical appearance were greatly modified by the activities of a single native species high in the food web [starfish]. These individual populations are the keystone of the community's structure, and the integrity of the community and it's unaltered persistence through time...are determined by their activities and abundances." (Paine, 1969)

Over the decades, the keystone species concept has evolved beyond it's focus on predators, but it remains "poorly defined and non-specific in meaning," (Mills, Soule and Doak, 1993). Natural scientists have tended to proliferate studies arguing the 'keystoneness' of their favorite organism. The emphasis in that research has been on single species as the unit of observation, not on the ecosystem or the patterns of interdependence among all the fauna and flora in one system.

In addition, the implied assumption that there will always be one and only one keystone in a system is suspect. Plants, micro-organisms, prey, the physical environment, and other aspects in an ecosystem may all play critical roles. As Princeton University ecologist Simon Levin says, "focusing on particular species often misses a great deal of what's important in an ecosystem" (Stone, 1995). By analogy, in the analysis of community economic systems, the exclusive focus on only one type of species (private sector businesses) or one type of relation (current account money flows) may also 'miss a great deal of what is important.'

It is instructive to note that the original 'top-down' keystone species concept in Biology has been fundamentally challenged by a more holistic, 'bottom-up' concept of functional groups, which recognizes that small organisms on the bottom of a food chain can also determine an ecosystem. The functional group concept is exemplified by the work of Robert Stenek, who studied a "curious pattern of continuity in the face of drastic change" among algae clusters floating around the Caribbean island of St. Croix.

'Almost none of the dominant species I'd find in one season would I find as dominant in the next,' Stone quotes Stenek, but "what was going on in the mats varied little: They consumed nutrients at a fairly constant rate and resisted similar types of predators. Different algae, apparently, were performing the same biological role" (Stone, 1995).

It is now widely understood that whole suites of organisms have similar functions in an ecosystem, and they can be modeled as a single group rather than species by species.

The functional group notion in biology corresponds to the economic notion of substitutes: inputs in production, or outputs in supply. Ecologists are now trying to develop methods to identify influential functional groups; i.e., keystone functional groups. By analogy, we economists seek to identify keystone sectors, where a 'sector' or group of sectors corresponds to a 'functional group' whose members substitute for one another.

In what follows, we define a community to be the set of businesses and institutions in a single geographic location, such as a town or city. We define the keystone sectors, to be the type of businesses or institutions, private or public, which play a unique and critical role in achieving the objectives of a community. By critical we mean necessary for the existing structure: without that type of entity, the structure is destroyed. We use a subset of graph theory (Berge, 1962) called social network analysis (Wasserman and Faust, 1994) to describe the structure of the community system. We also use it analyze the interrelationships between varied entities and to identify "species" or types. We introduce new tests of the sensitivity of the system to the absence of an entity.

Preliminary Analyses

As a preliminary step, we analyzed a survey of small businesses (Kilkenny, Nalbarte and Besser, 1999) in 30 small communities in Iowa. We sought a way to document the interdependencies among businesses, institutions, and local citizens. In that work, we tested and found statistical support for a " social embeddedness " hypothesis. We found that businesses whose owners or managers make more donations to their community and/or who serve as a volunteer or as an elected public servant, feel twice as successful as those who do not. The service, however, must be reciprocated by community support of the business. There is no evidence of differences across sectors or across towns by size. Neither the activity of the business nor the size of the town was found to be relevant. The findings are the same regardless of community sizes or different business activities.

We concluded from our work on "Reciprocated Community Support" that the usual market interactions and economic characteristics of business activity, typically thought to be most relevant in explaining firm success, were far less relevant than non-market interactions. This was a surprising finding, but not an unprecedented one. Kranton (1996) has argued that non-market reciprocal exchanges are a substitute for search and/or transaction costs. Non-market transactions can also help reduce damaging opportunistic behavior among strangers.

Economists concerned with the political economy of growth have also focused recently on the relationship between social capital and macroeconomic performance (Putnam, et al., 1993; Knack and Keefer, 1997). Almost all of these studies attempt to measure a dependence of macroeconomic performance (growth in regional or national gross product) on region-wide indicators of associational activity, trust, and civic cooperation. The hypothesis is that high-trust societies waste fewer resources protecting themselves from malfeasance; have cheaper, more credible and stable governments institutions; have more access to credit; and risk more on innovation-all of which lead to higher rates of national investment and national growth (Fukuyama, 1995; Knack and Keefer, 1997).

In sum, our preliminary work verified the social capital arguments by sociologists and economists about the importance of non-market interactions and social institutions in the economic success of a community. This further compelled the search for a way to identify a community's key sector(s) that considers a broader range of entities (not just businesses) and a broader range of relations (not just sale/purchase money flows) than the prevailing methods. However, in the data for our preliminary empirical work, only one side of an interaction was interviewed. That analysis relied on one side’s opinions of how the other side feels. The experience clarified that to test hypotheses concerning reciprocity, it is necessary to sample both sides of relations.

II. GRAPH THEORY and NETWORK ANALYSIS

This section introduces a method to identify keystone sector (s) from among many possible types of entities, taking into account numerous possible types of interdependencies in a community. The method must (1) describe interdependencies within and among agents, institutions, sectors in communities, (2) determine the degree of importance of an agent or groups of agents, and (3) show the sensitivity of the structure of the community to the absence of particular types of agents. Network analysis methods are appropriate for all three tasks.

Network analysis has been widely used in transportation system research (Hanson and Huff, 1986; Koppelman and Pas, 1985; Wright, 1979) and anthropology and sociology research (e.g. Granovetter, 1973, Freeman, 1977). Applications of graph theory or network analysis to identifying critical economic sectors, however, are scarce. The few attempts include the early work of Campbell (1975), and some recent initiatives by Kauffman (1988), Roy (1994, 1995), and Sonis and Hewings (1997).

Social Network Analysis

Statistical models based on graph theory have also been used by researchers to study social networks for almost 60 years. The goal of these models was (and remains) the quantitative examination of the stochastic properties of social relations between the actors of a particular network (Wasserman and Pattison, 1996). Applications range from studies of interactions between individuals: interpersonal relations, friendship, leadership, etc; to studies of interactions between groups: global studies of communities, studies of the élite and political behavior; and studies of power-sharing.

"Social network analysis may be viewed as a broadening or generalization of standard data analytic techniques and applied statistics, which usually focus on observational units and their characteristics. A social network analysis must also consider data on ties among units" (Wasserman, and Faust, 1994).

The basic feature of network analysis, as distinct from the more usual data analytic framework common in the social sciences, is the use of relational information to study or test hypotheses. A relation is the collection of ties of a specific kind among a set of entities or actors. Relational data can include, for example, data on family ties, interactions between people, or individuals' attitudes about other individuals in a group. The relational link between a pair of actors is called tie . A tie is a property of the pair, therefore a tie can not be thought of pertaining simply to an individual actor.

Since ties exist only between pairs of actors, the relevant unit of analysis is the dyad. A dyad consists of a pair of actors and the possible ties between them (Wasserman, and Faust, 1994). For example, a father and son are a dyad, and their familial relation is implied by their labels. Two cities connected by a commuter’s travel pattern between them, and a retail store and customer, are also dyads.

For social network analysis, relational data are collected by observing or interviewing individuals about their interactions with the others in the set or network. The unit of observation is an individual from whom we obtain information about their ties with other actors. For economic analyses, relational data may include data on the values of purchases or sales between firms, the existence (or lack) of contractual agreements or information flows between agents, or the value, volume, or frequency of international trade flows among countries.

Data on social interactions between entities presented in a matrix are called a sociomatrix . By convention, the rows of the sociomatrix represent the sending actors while the columns represent the receiving actors (note: this is the opposite of Social Accounting Matrix (SAM) conventions). A sociomatrix need not be square: the set of sending and receiving actors may be the same or different.

There are two main types of ties or relations: i) dichotomous or valued, and/or ii) directional or non-directional. A dichotomous relation is recorded as either the presence or absence of a tie between two entities in the set. Meanwhile, a valued relation records not only the existence of a relation but also the intensity or frequency of the relation (Wasserman and Faust, 1994). An example of a dichotomous relation is public safety agency A's provision of services enjoyed by business B. Since a public good is by definition non-rival, use or non-use is the relevant measure (rather than quantity used.) An example of a valued directional relation is the dollar value of exports recorded as being shipped from country A to country B.

A directional relation has an explicit origin and destination. A non-directional relation is non-specific about the origin or destination of the flow on the link (Wasserman and Faust, 1994). By convention in graph theory, a non-directional relation is represented by an edge . It is illustrated by a line between the interacting agents that has no arrowhead. A directional relation is represented by an arc . An arc is a line between entities with an arrowhead at the destination. For example, if country A exports manufactured goods to country B, the direction of trade is from A to B, reflecting B's imports from A. If the direction is not recorded, the trade flow could be misconstrued to be about exports of B. Graphically, the directed relation of A's international trade exports to B is shown as A -> B. The matrix form for recording this relation would have an entry in the A row and the B column (opposite to conventional SAM accounting).

Digraphs

As suggested above, relational data can be presented not only in matrix form but also by graphs. If the relational ties have direction (are arcs) the graphs are called directed graphs or digraphs . The entities in digraphs are called nodes and the relations are represented by arcs . Formally:

A digraph is a finite, non-empty set N, whose elements ni = {n1,n2,...ng} are called nodes, together with a set A = {a12, a13,...a1g,...ag-1,g} of ordered pairs aij, called arcs, where ni and nj are distinct members of N (Robinson and Foulds, 1980).

The graphic form shows how each agent in a system relates to all other agents in the system. If the number of agents (g) is not too large, a graph is an efficient way to show which agents are connected to which others, and which are isolated; which are senders or receivers. Figure 1 is a digraph of 73 agents. It is a good example of the case where the number of agents is "too large." For a comprehensive explanation of various ways to illustrate networks, see Linton Freeman's webpage or his article in the Journal of Social Structure (Freeman, 2000).

Figure 1. Digraph of the INFORMATION relation among Towertown entities

Figure 1 shows a multitude of dichotomous, directional ties between nodes. Clearly, the more fruitful form in which to analyze these ties is as a matrix of zeros and ones. Adjacency is the graph theoretic expression of the fact that two agents, represented by nodes, are directly related, tied, or connected with one another. Formally, given agents ni and nj in a set of agents N, and the A ={aij} arcs denoting the existence of relations from agents i to agents j; agents i and j are adjacent if there exist either of the two arcs, aij or aji. Given the digraph D = (N,A), its adjacency matrix A(D) is defined by A(D) = {aij} where aij=1 if either aij or aji exists, and 0 otherwise.

Note also that Figure 1 shows no reflexive ties. A reflexive tie is the one that an entity has with itself. In an adjacency or a sociomatrix, its presence would be recorded by aii = 1. In a digraph, reflexive ties are drawn as arrows that originate and end on the same node, that are curved back on themselves.

Looking at Figure 1 it seems like there are many interactions among the entities, but in fact it is far from complete. A complete graph is one in which all the actors have two-way ties to all other actors. In fact, in the digraph in Figure 1, only 24% of the entities have ties. The density of a complete graph is 100%. Formally, the density of a digraph (D) is the actual number of non-reflexive arcs in proportion to the maximum possible number of non-reflexive arcs: D = å i å j aij / N(N-1). Note that the numerator is the count of non-reflexive actual ties: å i å j aij for all iå ¹ j . In the case of a complete network, all nodes are reciprocally adjacent to one another, and all elements of the sociomatrix are equal to one.

The nodes at the top of the oval in Figure 1 are the sources or sinks for the largest number of ties. The strength of a node as a source or a sink in a system is most easily measured using the sociomatrix or adjacency matrix data. The number of arcs beginning at a node is called the outdegree of the node. Given dichotomous sociomatrix data, outdegree is the row sum for the node:

of actor i = å j aij . (1)

The number of arcs ending at a node is called the indegree of the node. The indegree is measured by the column sum for the node in a dichotomous sociomatrix:

indegree of actor j = å i aij (2)

According to social network analysis, the nodes at the top of the oval in Figure 1 are the most prominent because there are more links between them and the others. There are four measures of prominence: local centrality , local prestige , and global centrality in two forms: closeness and betweenness . Local centrality reflects the number of direct transmissions, and is thus measured simply by the outdegrees (or row sums) for each actor. Local prestige reflects the number of direct receipts, and is thus measured simply by the indegrees (or column sums) of each actor. Since these measures are based on the degrees of the nodes they are also known as degree centrality and degree prestige (Wasserman and Faust, 1994).

Actors may be directly related (with a one-step arc between them) or indirectly related through a third, fourth, or more remote entity. A path is a sequence of arcs from one node to another. When multi-step paths are taken into account, agents can be characterized more richly than just as sources or sinks. With respect to a path, an actor can be a transmitter (the arc is away from the node), a carrier (there are at least two arcs, one toward and one away), or a receiver (the arc is toward the node).

For example: consider the following digraph and adjacency matrix illustrating the relation between four entities A,B,C, and D:

A ® B ® C
    D

In the above, A is a transmitter, B is a carrier, and C is a receiver; while D is isolated. An actor is isolated when there is no arc that relates the actor to any other in the network. The adjacency matrix for this digraph is:

A

B

C

D

A

0

1

0

0

B

0

0

1

0

C

0

0

0

0

D

0

0

0

0

which shows that A relates to B, and B relates to C, while C does not relate OUT (only IN), and D does not relate at all.

Formally, a node is a proper source or transmitter if its indegree is zero and its outdegree is non-zero; that is, if the column sum is zero while the row sum is greater than zero. A node is a proper sink or receiver if its indegree is non-zero and its outdegree is zero, that is, if the column sum is greater than zero while the row sum is zero. A node is isolated if both indegree and outdegree are zero (Wasserman and Faust, 1994).

The distance between entities in a network is measured by the length of the path between them. Of course, there many be many paths, but the shortest path is the relevant one. The shortest path from ni to nj is called the geodesic . When nodes i and j are adjacent or tied in one step, the shortest path is the value of the sociomatrix element aij. When nodes i and j are related in a two-step path, the ijth element of A2 will be non-zero. If the shortest path is in three steps, the ijth elements of A and A2 will be zero, but the ijth element of A3 will be non-zero. In general, the pth power of the adjacency matrix shows the existance of all directed p-step paths. Consequently, the geodesic , d(ni,nj), is measured as the first power p for which the ijth element of Ap is non-zero Wasserman and Faust (1994).

Now we can define the last two global centrality measures of prominence. The ubiquitousness of an actor's ties qualify them as global and, if the actor is in the shortest paths between others, the actor is central. The shortness of the paths from that actor to others is measured by closeness . The proportion of intermediary roles an actor plays measures betweenness . Both measures rely on the geodesic measure of distance.

Closeness , C(ni), is the inverse of distance. The shorter the distances between node i and other nodes, the more central node i is. Formally, C(ni) = [å jN d(ni,nj)]-1 where d(ni,nj) is the geodesic (shortest path) distance between i and j entities in the network, and N is the network size.

Betweenness , B(ni) measures the probability that a path from actor j to actor k takes a particular route through agent i. Assume each one-step tie has equal weight, and assume that interactions will occur through the shortest routes. Formally, B(ni) = å jN gjk(ni)/gjk where gjk(ni) is the total number of geodesics through i, and 1/gjk is the probability that a particular geodesic is chosen.

All these measures depend on the size of the network. Thus, the measures must be standardized before comparisons are made between networks of different numbers of entities. In degree centrality and closeness measures, the measures are standardized dividing by N-1. For betweenness measures, the standardization factor is (N-1)(N-2), where N is the number of agents or network size.

Agents in a network that all relate to each other can be classified as a subset of the N actors as a group, a sub-graph, or as we shall say here, as a component. A component is the largest subset of related actors in a network. In the ABCD example above, three of the actors are linked to one another through one or more step paths. These three, ABC, are thus a sub-graph, group, or component . The whole network has two components, the group ABC and the singleton D.

Components have two forms: strong and weak, depending on the directions of the ties, arcs, or relational links between the members of the component. A strong component is one in which the arcs that make up the paths are aligned in continuous chain without a change of direction. For example, ABC make up a strong component. A weak component is made of actors that are linked by non-directional edges (Scott, 1991).

 

NETWORK ANALYSIS of a COMMUNITY

In this section we develop these graph theory and social network techniques to answer the basic question: what entity(ies) are critical in a small community? We cannot conclude which sector is always key in communities because generalizations should not be drawn from observations about just one community. If we can answer this question for one community, however, we may be motivated to develop the techniques to answer the question about a sample of communities.

To conduct a network analysis toward keystone sector identification that considers more than input-output relations, one needs dyadic data on a variety of relations among a variety of entities in a community. We found this type of data in a collection complied by Lauman (1985). In the late 1970s, the sociologists Galaskiewicz and Marsden gathered information on the formal organizations in a small U.S. town. The community was nicknamed, and referred to since, as Towertown. Of the 73 entities they studied, less than a quarter are private businesses (Appendix I). They assayed three relations among these 73 entities: money, information, and support.

In this section we apply network analysis techniques to the Towertown data to do five things. First, we describe community-wide patterns: Is there a specific structure among agents, or do agents interact randomly with each other? Second, we describe the patterns of interactions among the individual agents in the community and compare their various roles. Third, we determine the groups of individual agents whose patterns of interactions are sufficiently similar that they can be treated as types of agents. This step is analogous to classifying numerous individual organisms into less numerous species or functional groups. If we cannot do this, the number of agents will remain "too large," as illustrated in Figure 1. Fourth, we assign ties to the sociomatrix data aggregated according to functional groups. Fifth, we test for the impact of the excision of each functional group from the community network. Are there any species that play such important roles that the community structure will be destroyed without them? This final step completes our identification of the "keystone" sector. Recall that we have defined the keystone as the entity that is relatively unique and without which the community structure will be fundamentally disrupted.

Towertown Survey

Towertown is the nickname given by to the midwestern community of 32,000 persons studied by Galaskiewicz and Marsden; Laumann (1985). A total of 109 organizations were identified in Towertown, and 73 were studied. The 73 included all business firms with more than 20 employees, banks, law firms, political organizations, associations, health institutions, educational institutions, service clubs, labor unions, city offices and departments, and churches.

Galaskiewicz and Marsden interviewed the executive officers of the 73 organizations. Each subject was presented with a list of the other 72 organizations in the community, and was asked the following six questions:

  1. To which organizations on this list would your organization be likely to pass an important information concerning community affairs?

  2. To which organizations on this list does your organization rely upon for information regarding community affairs?

  3. To which organizations on this list does your organization give substantial funds as payments for services rendered or goods received, loans, or donations?

  4. From which organizations on this list does your organization get substantial funds as payments for services rendered or goods received, loans, or donations?

  5. Which organizations on this list does your organization feel a special duty to stand behind in time of trouble: that is, to which organization would give support?

  6. Which organizations on this list would be likely to come to your organization’s support in time of trouble?

From the responses to these questions, three dyadic relations were defined: INFORMATION (1,2), MONEY (3,4) and SUPPORT (5,6). An organization was determined to be "in relation to" another organization if the former organization answered yes to the first question in a pair, or the latter organization answered yes to the second question in the pair. Note that if either actor in the dyad reported the existence of a tie, a tie was recorded. In other words, for each relation R a 73 x 73 adjacency matrix (AR ) was constructed with entries aRij = 1 if the ith actor has a relation R tie with the jth actor and aRij = 0 if not. (Also, aRii = 0.)

Macro/Community wide structure

Density

The first step is to study the density and connectivity of the whole network. We describe community-wide patterns: Is there a specific structure among agents, or do agents interact randomly with each other? The density measure describes general level of linkage among the actors in the community. This measure compares the number of actual to possible relations, to show how far from completion the Towertown community network is.

Since all three sociomatrices about Towertown have the same symmetric number of actors (N = 73), the maximum possible number of non-reflexive arcs is 73x72= 5256. The arcs present in the INFORMATION sociomatrix are 1264, while the ones present in the MONEY and SUPPORT sociomatrices are 512 and 814 respectively. Consequently, the density measures are 24%, 9.7% and 15.5% for INFORMATION, MONEY and SUPPORT, respectively. INFORMATION linkages are more dense, and thus more complete. MONEY relations are the least dense and thus least complete in Towertown.

Components

If a sociomatrix is dense, we expect to find a single component . If not, there may be multiple components of various sizes. None of the three Towertown sociomatrices are dense (recall the INFORMATION sociomatrix is the most dense, having 24% of the maximum number of ties). So, we expected to find multiple components in the Towertown network. But we did not. A glance at the matrices indicates that every entity has at least one source or sink of INFORMATION, MONEY, and SUPPORT. There are no columns or rows that have all zeros. Furthermore, none of the connected entities are exclusive. The Towertown community appears to be one big, interconnected group. This analysis shows that in Towertown, there are no subgroups or components (no cliques), and no isolated actors.

Micro/individual roles

Now we know that there are no isolated actors in Towertown. But we also know that the links among all the actors are not all mutual. Next we apply the techniques to describe the roles of individual actors. Which actors are distinguished either because they receive a lot from other actors, or, because they give a lot to other actors? In this section we continue to build the network analysis lexicon by demonstrating the use of measures of indegree and outdegree to describe individual roles.

Local Centrality and Prestige

Prominent actors are those that are extensively involved in relationships with other actors. This involvement makes them more visible in the community. The prominence could be due to both receiving and transmitting. To determine which actors are prominent, we consider all the directed ties ( arcs ) originating from the actor ( outdegree ), plus all the received ties ( indegree ), plus all the indirect ties (multiple-step paths) as well.

Tables 1, 2 and 3 present the measures of local centrality and prestige for MONEY, INFORMATION, and SUPPORT. The measures are unstandardized with respect to the maximal number ties in each sociomatrix. In this case of equal dimension sociomatrices, and in which we are not comparing across networks in different size communities, standardization is irrelevant. But if the number of agents surveyed varied from relation to relation, or network to network, one would want to standardize each set of degree measures by normalizing with respect to the number of agents.

The first four rows in each of Tables 1-3 show the sample statistics for each degree measure for selected agents in Towertown. The mean of local centrality , for example, is the average outdegree among the 73 entities. With respect to MONEY ties (Table 1), on average, a Towertown entity gives money to 7 other entities in Towertown. The mean of local prestige is the average indegree measure across all 73 entities. For example, with respect to INFORMATION ties (Table 2), on average, a Towertown entity receives INFORMATION from 17 entities in Towertown.

The 5th to 11th rows in each table show the local centrality and prestige measures of the top six most central and prestigious individual entities in Towertown. For example, with respect to MONEY (Table 1), Towertown Newspaper is the most locally central, giving money to 33 other entities in Towertown. 1st Towertown Bank is the most prestigious, receiving money (presumably deposits or interest payments) from 49 entities in Towertown.

Table 1. MONEY

LOCAL

CENTRALITY

LOCAL

PRESTIGE

sample statistics:

Mean

7.01

7.01

Std. Dev.

6.35

8.97

Minimum

0

0

Maximum

33

49

top six entities:

(11) 1st TT Bank

28

49

(12) TT Saving bank

17

38

(13) Bank of TT

23

26

(39) TT Newspaper

33

17

(69) Family Services

3

36

(71) YMCA

4

24

Table 1 shows that four of the top six agents are both locally central and prestigious . These agents are three of the four banks and Towertown Newspaper. Family Services and YMCA, on the other hand, are only locally prestigious (high indegree but low outdegree, and both outdegree measures are below the mean). These measures identify some intuitively logical patterns and relative positions. Banks both give (make loans) and receive (accept deposits) money; and the local centrality and prestige measures document that. Also, Family services and the YMCA both rely on volunteers and donations, so it is logical that they would show up as the major recipients of money. Note also that the newspaper is a source of money for many entities.

In the case of the INFORMATION relation (Table 2), again the top six agents appear to be both locallly central and prestigious. All six agents receive and give information to a significantly (>2 SD) higher number of agents in the community than the rest of the entities. The Radio Station appears to be the most locallly central and prestigious in Towertown.

Table 2.

INFORMATION

LOCAL

CENTRALITY

LOCAL

PRESTIGE

sample statistics:

Mean

17.29

17.29

Std. Dev.

11.21

11.22

Minimum

1

3

Maximum

63

62

top six entities:

(12) TT Savings bank

41

26

(25) City Council

32

43

(26) City Manager’s office

43

44

(39) TT Newspaper

42

44

(40) WTWR Radio

63

62

(69) Family Service

45

36

Table 3 shows the results with respect to the SUPPORT relation. The Community College both receives and gives more support than all other entities. "In times of trouble," the Community College is willing to give support to 59 entities, and 58 entities are willing to give support to the Community College.

Table 3. SUPPORT

LOCAL

CENTRALITY

LOCAL

PRESTIGE

sample statistics:

Mean

11.12

11.12

Std. Dev.

9.42

10.53

Minimum

0

0

Maximum

59

58

top six entities:

(12)TT Savings bank

37

14

(20) Small Bus.Assoc.

38

10

(56) TT Comm.College

59

58

(57) State University

20

42

(69) Family Service

12

52

(72) Mental Health

16

36

Towertown Savings Bank appears to be locally central with respect to SUPPORT, but not locally prestigious (has high outdegree but the indegree value is close to the mean). This contrasts with its local centrality being lower than local prestige according to the MONEY relation. Note that Towertown Savings Bank has fewer out- than in-degrees, suggesting that it makes loans to fewer entities in the community than the number from whom it receives deposits. The bundling of many small deposits to make a few larger loans is the main business of financial intermediaries, known as "asset transformation." It is interesting to note, however, that this particular bank does not attract as much SUPPORT as it says it would give.

Note also that different actors are prominent across the MONEY , INFORMATION and/or SUPPORT relations. Towertown Savings and Loan and Towertown Family Service entities are, however, central and/or prestigious in all three of the relations.

In sum: we have shown how to classify entities with large outdegrees as important sources of INFORMATION, MONEY, OR SUPPORT. Actors with low in- or out-degrees are apparently less active, less important, or peripheral to a network.

Note, however, the limitations of dichotomous rather than value data. In dichotomous (0/1) data, what counts is the number of ties, not their level. While this may be quite appropriate for the analysis of non-rival types of interactions such as information and support ties, it does not fully describe rival transactions such as money ties. It is likely that agents with a few high-value MONEY ties play more important roles than can be perceived with dichotomous data. For example, a business that buys large amounts of goods and services from just a few other local businesses (who then buy from other local businesses, employ many residents, and provide many donations; and so on) may not have so many ties, but because of their magnitude, its few ties may be critical to the community.

Global centrality

The global centrality measures are based on the length and the number of carrier and multiple-step path roles. When an actor has a position of strategic significance in the overall network, that actor is considered globally central. Interactions between non-adjacent actors depend, by definition, on an intermediary. Thus, globally central actors can have widespread effects on a community because they are more closely tied with more of the other agents, and, act as intermediary for more agents; than any others. Globally central actors are detected using the closeness and betweenness measures.

The summary statistics for global centrality in Towertown, according to both measures, with respect to MONEY, INFORMATION and SUPPORT relations are shown in Table 4:

Table 4. GLOBAL CENTRALITY
sample statistics

CLOSENESS

BETWEENESS

 

MONEY

INFO

SUPPORT

MONEY

INFO

SUPPORT

Mean

52.34

59.06

53.42

88.51

57.68

78.73

SD

6.74

7.13

8.38

244.87

131.43

238.37

Minimum

37.50

46.75

32.00

0

0.06

0

Maximum

78.26

88.89

82.76

1762.24

1013.23

1998.32

The higher the closeness of an entity, the more often entities are directly tied to it. Note that on average there are more direct INFORMATION ties (the mean of closeness is highest) than SUPPORT or MONEY ties. The mean of betweenness is the average number of times an actor is articulating (or, is the intermediary in the shortest path), in a relation between two other actors. With respect to MONEY flows, on average a Towertown entity serves as intermediary in 89 relations between other entities.

Table 5 shows the measures for the most globally central entities according to closeness and betweenness measures. The higher the value of closeness, the shorter are the paths to the entity relative to other entities in the community. For example, 1st Towertown Bank reaches the most other agents with the fewest MONEY steps. 1st Towertown Bank is significantly closer (more than 2 SD's higher than the mean) than other entities in Towertown. Comparing two of the banks, 1st Towertown Bank (closeness value = 78) is closer to the rest of the agents in Towertown than the Bank of Towertown (closeness value = 65).

Table 5. GLOBAL CENTRALITY

CLOSENESS

BETWEENNESS

MONEY

INFO

SUPPORT

MONEY

INFO

SUPPORT

1ST TT Bank (11)

78

 

 

1762

 

 

TT Savings (12)

71

 

 

686

 

 

Bank of TT (13)

65

 

 

409

 

 

Ch. Commerce

 

 

 

 

222

 

City Council (25)

 

73

 

 

 

 

City Mng’s office

 

76

 

 

308

 

TT News (39)

71

75

 

900

344

 

RADIO (40)

67

89

 

 

1013

 

CmtyCollege(56)

 

 

83

 

253

1998

State Univ. (57)

 

 

70

 

 

222

Family Serv (69)

 

76

77

 

 

191

The higher the values of betweenness betweenness, the more potential an actor has to control third-party relationships. Again, 1st Towertown Bank shows the highest betweenness measure, it appears to be in the strongest control position. In both MONEY and SUPPORT flows, the entities that appear to be the most 'close' also appear to be the most 'between' actors. This analysis of betweenness in the MONEY relation shows that banks and the newspaper are the ‘main roads’ to other entities. They act as intermediaries for more other agents than any other entities in Towertown. This helps explain why the newspaper is the most locally central (has the highest outdegree in the MONEY relation) according to the measures shown earlier in Table 1.

If we consider the flow of SUPPORT, the Community College is a hub in Towertown. The global centrality of the Community College is shown both by its closeness to the rest of the agents and because it has a major articulation role as evidenced by its maximum betweenness score.

The findings for flow of the INFORMATION relation are also consistent with our intuition or priors. Our analysis of the INFORMATION network in Towertown shows that the media organizations: the newspaper and radio; and the community's coordinating office: the city manager’s office are globally central.

Peripheral Entities

This digression concerns how to identify (and exclude from further consideration?) the entities that are the least likely to be keystones. The actors with the lowest values of local and global centrality are the actors that are peripheral to the network. According to the Towertown data, the entities that do not have many interactions with other entities include the unions, the churches, and some of the government offices (measures not shown). Their counts of interactions are significantly (more than 2 SD) below the means.

Two actors are notoriously peripheral in all the relations: the bankers’ association and the county bar association. Why that might be the case? The network analysis shows that BANKS relate to many entities. Banker-members of the banking association clearly have many ties as professionals. But as an association, the association's relations with other entities are few. This may be because there is no need for the association. It would be redundant to replicate the plentiful ties the member-bankers already have.

Entities Critical to Network Structure

Now that we know about the centrality and prestige of agents, we have filled in some gaps left from our analysis of components . In the analysis of components we showed that all Towertown entities are interconnected and there are no cliques or subgroups. This does not provide any hints that any particular set of agents display unique patterns of relations. The analyses of centrality and prestige, however, showed that there are differences among entities. Some, notably banks, are better connected than others. But we still do not know whether banks' connections are critical to maintaining the structure of the Towertown community. This next section presents tests for this aspect of "keystoneness". If the single-component structure of the Towertown network is destroyed by the excision of an entity, that entity is considered critical for the maintenance of Towertown's structure.

Cut-points

Entities that are vital for the connectivity of a network are those without which other entities will become isolated. Now we shall test if any individual actors that have been shown to be central are also critical to maintenance of the network structure. The central entities are,