Agglomeration of invention in the Bay Area: Not just ICT *

We document that the Bay Area rose from 4% of all successful US patent applications in 1976 to 16% in 2008. This is partly driven by the increase in the prevalence of information and communication technology; however, even for patents unrelated to information and communication technology, we see a disproportionate increase in the share of all US patents from the Bay Area. We interpret this to suggest that there has been a trend to coagglomeration in invention across technologies. We explore several possible explanations for this trend, and conclude that the size of firm or simple measurement error cannot explain it.


Introduction
Does invention agglomerate, and if so, where does it agglomerate?In this paper we examine changes in patterns of agglomeration in invention over time, using data on all US patent applications.
There are plenty of reasons to expect invention to agglomerate.Carlino and Kerr's (2015) recent handbook chapter summarizes many such results, emphasizing the role of input sharing, labor market matching, and knowledge spillovers, among others.Knowledge spillovers received an especially large fraction of attention in their chapter, and in the literature overall (e.g., Audretsch and Feldman 1996;Agrawal and Henderson 2002;Moretti 2004Moretti , 2012;;Rosenthal and Strange 2008;Jaffe, Trajtenberg, and Henderson 1993;Thompson and Fox-Kean 2005;Kerr and Kominers 2014;Moser 2011;Waldinger 2012;Azoulay et al 2010).
Simple economics might forecast that most invention agglomerates in the same area as the primary using industry (Carlino and Kerr 2015).For example, patents related to automotive technology are clustered in Detroit (Hannigan, Cano-Kollmann, and Mudambi 2015).Or causality could be reversed: The location of a break-through invention can lead to industry agglomeration and localized follow-on invention (Duranton 2007;Kerr 2010).We label this "colocation" between invention and industry.
However, other forces push away from colocation.Invention itself is an economic activity and it shares inputs, such as specialized labor institutions, particular intellectual property contracts, and information spillovers from one type of invention to another.If such forces are strong, they could lead to agglomeration of lots of different types of invention in one place.We call this "coagglomeration of invention" For many industries, the key inventions could be in a location distinct from the place where production for the downstream using industries reside.
Using patent data to measure invention, there are two approaches to investigate colocation and coagglomeration of invention.One is to map the agglomeration of downstream industries and invention and measure the geographic correlation.We take another approach.We look for evidence of coagglomeration of invention -namely, invention from distinct areas appearing in the same location, irrespective of downstream using industry.
We find evidence consistent with the hypothesis of coagglomeration.We demonstrate a strong trend toward the clustering of patenting to the San Francisco Bay Area from 4% of US patents in 1976 to 16% of US patents in 2008, a time period when the fraction of the US population in the Bay Area did not increase substantially relative to the US population as a whole. 1 While this increase in Bay Area patenting is partly driven by the increasing fraction of patents in information and communication technologies (ICTs), ICTs cannot fully explain the trend.The San Francisco Bay Area has seen a substantial increase in its share of patents, even for patents that seem quite distant from ICTs.Our broadest definition of ICT patents includes all patents in information technology, communications technology, electronics, a broad measure of software based on patent classes and a textual search of patent titles and abstracts, and all patents that cite any of these patents.The remaining non-ICT patents were 66% of all patenting in 1976 but just 26% in 200826% in . In 2008, 6, 6.2% of such patents have inventors based in the Bay Area second only to New York City's 8.1%. 2   Our results are consistent with coagglomeration of invention in the Bay Area.While others have documented a tendency toward agglomeration of patenting by industry, we believe we are the first to document a general tendency toward agglomeration in patenting across industries and patent classes.Further, our study is unique in its documentation of agglomeration in one particular region, the Bay Area.
Coagglomeration has been documented in other settings and other industries.For example, Rosenberg (1963) analyzes how sewing machines, bicycles, and automobiles located in Northern Ohio and southeastern Michigan as they shared the same set of inventions in machine tools, and the growing downstream industries induced additional improvements in those innovations over time.Glaeser (2005) discusses coagglomeration of many industries in New York City, starting in the nineteenth century.Summarizing prior literature, he argues that New York's dominance started with shipping.There is a clear reason why New York could dominate in shipping: New York has a particularly appealing natural harbor beside an inland waterway.Shipping led to risk sharing and insurance, which led to finance.Shipping also led to manufacturing and early book publishing.Population growth, combined with manufacturing and finance led to other services.A number of recent researchers have explored the causes and consequences of such coagglomeration, including Ellison, Glaeser, and Kerr (2010) and Helsley and Strange (2015).Of particular relevance to our study, Delgado, Porter, and Stern (2014) document complementarities between employment and patenting in regions with multi-industry clusters.
At this point, our results do not provide a definitive conclusion on the cause of this broad increase in coagglomeration in invention.A variety of mechanisms are possible including regulations such as nonenforcement of non-compete clauses (Franco and Mitchell 2008;Marx, Singh, and Fleming 2015), agglomeration or expertise in startup financing (e.g., Chen, Gompers, Kovner, and Lerner 2010), shared labor markets across invention types (Almeida and Kogut 1999), and knowledge spillovers across invention types.
One important limitation of our analysis is the use of patents as a measure of invention.Patents measure invention imperfectly, and the ease with which they can be measured means that economists have been perhaps overly focused on patents to measure invention.Some patents are more important than others and many inventions are never patented.Still, patents are a useful measure because they are observable, and comparable across time and categories.This will bias our results if the inventors in the Bay Area have become increasingly likely to patent when they invent.
Of course, we are not the first to document the agglomeration of ICT in the Bay Area.Garcia-Vicente et al (2014) show that such agglomeration took place primarily in the 1980s and 1990s.Our results are consistent with this timing.A variety of authors have explored the reasons behind the agglomeration of the ICT industry in the Bay Area and its dynamics in generating new firms and new ideas (Almeida and 2 One unusual aspect of patenting in the San Francisco Bay Area is that invention is not centered in the city but in Silicon Valley.Therefore, while we refer to other cities by the city names, we refer to the "Bay Area" rather than "San Francisco" to describe the San Francisco Consolidated Metropolitan Statistical Area.We use the 2013 definition, which includes the following 12 counties: Alameda (fips 06001), Contra Costa (fips 06013), San Francisco (fips 06075), San Mateo (fips 06081), Marin (fips 06041), Santa Clara (fips 06085), San Benito (fips 06069), San Joaquin (fips 06077), Sonoma (fips 06097), Solano (fips 05095), Santa Cruz (fips 06087), and Napa (fips 06055).Kogut 1999, Kerr and Kominers 2015, Saxenian 1994;Franco and Mitchell 2008;Marx, Singh and Fleming 2015, etc.).Our contribution relates to the finding of the increasing role of the Bay Area in patenting overall.

Data and empirical strategy
We use patents granted by the US Patent and Trademark Office (USPTO) as our measure of invention.Because of the delay between patent application and grant date, we date patents using the year of application.We have data on patents granted between 1976 and 2012, and our analysis data set includes patents with application dates between 1976 and 2008.We cut off the last four years of the data because of lags between year granted and year filed.Generally, we start to see a decline in patenting in 2008, suggesting right truncation may be an issue for the last few years of our data.The trends we identify appear long before 2008.
Patents have been shown to provide a useful measure of a firm's intangible stock of knowledge (Hall et al. 2005).Their limitations are well known.Not all patents meet the USPTO criteria for patentability (Jaffe and Trajtenberg 2002).Not all inventors seek to patent, and many use alternative means to appropriate value from their inventions.Further the propensity to patent has changed over time during our sample (e.g., Hall and Ziedonis 2001), this was particularly the case for patents related to software which grew rapidly toward the end of our sample period due to legal changes which strengthened the legal rights of patents in this area (e.g., Graham andMowery 2003, Hall andMacGarvie 2010).Our use of patent citations as a measure of knowledge flows between successive generations of inventions can also create measurement error (Roach and Cohen 2013).We are comfortable with using patents in this context because our primary focus is on changes in the geographic distribution of patenting within broad technology areas over time.While the propensity to patent has changed across patent classes over time, we do not believe it has changed significantly across geographic locations patenting within a patent class.
We map inventors to counties and MSAs using the zip code of the location of the inventor.We used consolidated MSAs (CMSAs) where those were present.This will be particularly important for our analysis of the Bay Area, which includes several component PMSAs such as Oakland, San Francisco, San Jose, Santa Cruz-Watsonville, Santa Rosa, Stockton-Lodi, Vallejo-Fairfield, and Napa.
For most of the analysis that follows, we do not weight by citations.For multi-author patents, we divide by the number of authors.For example, if a patent has 1 author in the Bay Area and 2 authors in Boston, it would count as 1/3 of a patent in the Bay Area and 2/3 of a patent in Boston.Our results are generally robust, and often stronger, using three year and five year citation-weighted measures.For example, using either three or five year citation weights, the Bay Area surpasses New York City as the location with the most patents three years earlier than with the unweighted measure.Some of our results require us to recognize a consistent identifier for assignees (especially firms) within a particular application year.For this paper, we do not seek to identify changes in patenting activity within firms over time.Because assignee names are not coded consistently within the patent data, the challenges of mapping patents to assignees is well known.No prior data set provides a complete set of cleaned assignee names during our sample period.Using the data file of standardized names in the NBER database and the names in ICT industries compiled by Ozcan and Greenstein (2013) as starting points, we cleaned the assignee names to create our own assignee identifier.
Our analysis requires us to identify patents that represent inventions related to ICT, or inventions that draw upon the stock of knowledge related to ICT.As is well known, identifying such inventions through the patent data is notoriously difficult (see, e.g., Graham andMowery 2003, Bessen andHunt 2007;Hall and MacGarvie 2010).As a result, we use several different definitions based on the primary class of the patent and explore the robustness of our results to four alternatives.We discuss the construction of these alternatives in the online appendix.
Our data contain a total of 2,213,271 patents.In 1976, there were 41,100 new patents issued from the PTO.At the peak of our data in 2007, there were 100,832 patents.
We present our results at the year level, as aggregated means over the 33 years from 1976 to 2008 inclusive.In particular, our results are presented as graphs of time trends of the fraction of patents each year that meet some criteria such as being based in the Bay Area.This is therefore a descriptive exercise that tests whether the results are consistent with increasing coagglomeration in the San Francisco Bay Area over time.We have not determined the primary cause(s) of the observed patterns.

Results a) Patenting across locations
Given the overall rise in the propensity to patent, all major cities had an increase in the number of patents.We explore the fraction of all US patents by city, thereby controlling for the overall trend.
Figure 1 shows the increasing importance of the Bay Area as a fraction of US patenting.Figure 1a compares the top 10 cities in the United States, defined by the total number of patents between 1976 and 2008.In 1976, New York City was the dominant center for patenting, with just under 15% of all patents.Los Angeles was second and Chicago was third.Generally, patenting was highly correlated with population.The Bay Area rose steadily as a fraction of patenting in the 1970s and 1980s, and then the trend increased in the 1990s before settling down at the earlier rate of increase in the 2000s.In 1995, the Bay Area surpassed New York City as the US location with the largest number of patents.Figure 1b contrasts the 11 th through 20 th cities in patenting with the Bay Area in order to show that no other city has a rise similar in scale.
Figure 1c combines locations into four groups: the Bay Area, New York City, the 18 other cities in the top 20, and all other locations.Generally while New York and locations outside the top 20 are falling as a proportion of patenting, the Bay Area is rising quickly, and the other 18 cities in the top 20 are rising slightly (42.6% in 1976 to 46.1% at the peak level in 2004).

b) Patenting across types of patents
The Bay Area has had a cluster of ICT firms for many years.Therefore, one reason the Bay Area is becoming an increasing large fraction of patenting is that overall increase in ICT patents.Figure 2 displays this increase using the Hall, Jaffe, and Trajtenberg (HJT) definitions of patent classes.Computers and Communication (Class 2) went from under 10% of patents to over 30% of patents between 1980 and 2005.Some of this growth may reflect changes in the propensity to patent software and other ICT inventions (e.g., Graham andMowery 2003, Hall andZiedonis 2001) that have been encouraged by sympathetic treatment in the courts and the PTO.Drugs and Medical (Class 3) tracked the increase in Computers and Communication until the mid-1990s but then settled back to around 13% of patents.
We offer the first evidence of the coagglomeration hypothesis with Figure 3, which shows the fraction of patents that are in the Bay Area by broad class.The increase is sharpest in Computers and Communication and in Electrical and Electronic (Class 4).It is also visibly noticeable in Chemicals (Class 1), Drugs and Medical, and Mechanical (Class 5).In Other (Class 6) the increase is smaller, rising from 3.8% in 1976 to peak of 6.3% in 2004 before falling back to 4.4% in 2008.Thus, for five of six broad patent classes, we see a noticeable rise in the proportion of patents coming from the Bay Area.
One possibility is that many of the patents in Chemicals, Drugs and Medical, and Mechanical classes are ICT-based.Software has increasingly been used as an input into a wider array of inventions in other patent categories (Arora, Branstetter, and Drev 2013;Branstetter, Drev, and Kwon 2015), taking advantage of increasingly inexpensive and more capable electronics, especially processors.Figure 4 provides alternative measures of ICT and non-ICT patents to account for this possibility, and examines the trend over time.
Panel A of Figure 4 provides the narrowest definition.It defines ICT patents as patents in HJT Class 2: Computers and Communication.The solid line at the top of the graph shows the increasing proportion of Class 2 patents that are in the Bay Area, replicating the solid line at the top of Figure 3. Panel B provides a wider definition, including software patents as defined by Graham and Mowery (2003).Panel C provides a still-wider definition, adding to the definition in panel B all Electrical and Electronic patents (HJT Class 4).Panel D provides our widest definition, which uses the definition in Panel C as a starting point and then widens it to include software patents identified through a keyword search as in Bessen and Hunt (2007) and software patents identified in Graham and Vishnubhakat (2013).This broadest definition includes 74% of all patents in 2008 and is likely to include many false positives.Using all four, increasingly broad, definitions of ICT patents, the solid lines show that the proportion of all ICT patents in the Bay Area has risen sharply since the 1970s.
The dotted line identifies all patents that cite ICT patents.In each panel, the ICT patents are defined as in the previous paragraph.These patents are not explicitly categorized as ICT using the definition above, but they are connected through citation and therefore build on ICT invention.There is a clear trend toward an increasing proportion of these patents in the Bay Area, providing another explanation for the rise of Bay Area patents.Together, the above suggest the following: ICT is an increasingly large fraction of patents; the Bay Area is an increasingly large (and even dominant) fraction of ICT patents; and the Bay Area is an increasingly large fraction of patents that cite ICT patents.Given prior results on agglomeration of the ICT industry in the Bay Area, perhaps none of these results are surprising, though we believe that the results on geography of patents that cite ICT are not previously documented.These all could result from agglomeration of software invention near the location of the firms producing electronics, computing, and communications.
The evidence for coagglomeration of invention appears in the dashed line in Figure 4: The Bay Area is an increasing fraction of non-ICT US patents, even for the broadest definitions of software.Panel D shows that the fraction of non-ICT patents in the Bay Area rises from 3.9% to 6.2% from 1976 to 2008, a 59% increase.Under the narrower software definition in Panel C, the value rises from 3.9% to 6.9%.Dropping electronics, as in Panel B, the proportion of non-ICT patents in the Bay Area rises from 4.3% to 9.5%.
While these figures are more modest than the increase in ICT patents, they still suggest an increasingly important role for the Bay Area, relative to all other areas, in US non-ICT patenting.Figure 5  Overall, we interpret these results to suggest that we cannot reject coagglomeration of invention.The increase in patenting in the Bay Area is not entirely attributable to the increasing fraction of ICT patents in overall patenting.

c) Patenting across types of patentees
Does the evidence for coagglomeration reflect some other factors, such as firm size?We look at the fraction of patents in the Bay Area by size of patentee.We find that size does not explain the results.
We split patentees into four categories: Independent inventors, firms in the 99 th percentile of firm patentees, firms between the 50 th and 99 th percentile of firm patentees, and firms below the 50 th percentile of patentees.Figure 6 Panel A shows that the fraction of patents in the Bay Area is rising sharply over time for all four groups.The 99 th percentile group has a more discontinuous increase, largely driven by a sharp increase in the proportion of top patenting firms that are ICT firms in the 1990s.The only exception to the general trend is that, since 2000, the proportion of all patents from smaller patenting firms (below the 50 th percentile) in the Bay Area has declined to the levels of the early 1990s.Panel B looks at ICT patents only (defined to include software and electronics as in Figure 4 panel C) and shows a steady rise in the proportion of such patents in the Bay Area between 1976 and 2006, again with the exception of smaller patenting firms since 2000.
Panel C looks at non-ICT patents, again defined as in Figure 4 panel C.The fraction in the Bay Area has been rising steadily for firms below the 99 th percentile and for independent inventors.In the 99 th percentile, results are noisier as the entry or exit from the 99 th percentile of one firm can make a meaningful difference.For example, the value is generally under 3% with the exception of the period 1997 to 2002.During this period, at least one of Genentech, Advanced Cardiovascular Systems, and Incyte Pharmaceuticals was in the 99 th percentile of non-ICT patenters each year.

Conclusions
We have documented an increase of the fraction of US patenting of all kinds that occurs the Bay Area that is disproportionate to population growth and occurs within a variety of patent classes.This partly results from the agglomeration of invention near the production of firms who use the invention, and who themselves agglomerate in one area.We also think it offers evidence of coagglomeration, the clustering of invention from many distinct types of invention into one geographic area.
While we do not know the cause of the rise in coagglomeration of many patent types in the Bay Area, our results suggest that any possible explanation must be broad-based.In particular, any explanation must account for growth in the fraction of ICT and non-ICT patents in the Bay Area and for the increase to be true of large inventing firms, small inventing firms, and independent inventors.
compares the Bay Area to the four other top patenting cities in the United States.Panel A uses the narrower ICT definition that includes Software and Computers and Communications patents in Figure 4 panel B. Under this definition, the Bay Area overtook New York as the top location for non-ICT patenting in 2000.Panel B includes Electrical and Electronic patents as in Figure 4 panel C.Under this broad definition, the Bay Area was second behind New York for most of the period from 1997 to 2008.Using the broadest definition (as in Figure 4 panel D) yields a similar pattern (though, as noted above, that definition will include many false positives on software patents).