Field Evidence on Individual Behavior & Performance in Rank-Order Tournaments Kevin J. Boudreau Constance E. Helfat Karim R. Lakhani Michael Menietti Working Paper 13-016 August 9, 2012 Copyright © 2012 by Kevin J. Boudreau, Constance E. Helfat, Karim R. Lakhani, and Michael Menietti Working papers are in draft form. This working paper is distributed for purposes of comment and discussion only. It may not be reproduced without permission of the copyright holder. Copies of working papers are available from the author. FIELD EVIDENCE ON INDIVIDUAL BEHAVIOR & PERFORMANCE IN RANK-ORDER TOURNAMENTS* Kevin J. Boudreau, London Business School, Harvard Business School, Harvard-NASA Tournament Laboratory kboudreau@london.edu Constance E. Helfat, Constance.E.Helfat@tuck.dartmouth.edu Tuck School of Business at Dartmouth College Karim R. Lakhani, k@hbs.edu Harvard Business School, Harvard-NASA Tournament Laboratory Michael Menietti mmenietti@fas.harvard.edu Harvard University, Harvard-NASA Tournament Laboratory August 9, 2012 Abstract Economic analysis of rank-order tournaments has shown that intensi ied competition leads to declining performance. Empirical research demonstrates that individuals in tournament-type contests perform less well on average in the presence of larger number of competitors in total and superstars. Particularly in ield settings, studies often lack direct evidence about the underlying mechanisms, such as the amount of effort, that might account for these results. Here we exploit a novel dataset on algorithmic programming contests that contains data on individual effort, risk taking, and cognitive errors that may underlie tournament performance outcomes. We ind that competitors on average react negatively to an increase in the total number of competitors, and react more negatively to an increase in the number of superstars than non-superstars. We also ind that the most negative reactions come from a particular subgroup of competitors: those that are highly skilled, but whose abilities put them near to the top of the ability distribution. For these competitors, we ind no evidence that the decline in performance outcomes stems from reduced effort or increased risk taking. Instead, errors in logic lead to a decline in performance, which suggests a cognitive explanation for the negative response to increased competition. We also ind that a small group of competitors, who are at the very top of the ability distribution (non-superstars), react positively to increased competition from superstars. For them, we ind some evidence of increased effort and no increase in errors of logic, consistent with both economic and psychological explanations. JEL Codes: D03 * Jack Hughes, Robert Hughes, Ira Heffan, and Mike Lydon from TopCoder generously provided their time and assistance with this paper. Seminar participants at Duke University and Harvard Business School provided feedback on this paper. The Harvard-NASA Tournament Laboratory supported this work. All mistakes remain our own. 1 1 Introduction Rank-order tournaments and contests have attracted great interest since the seminal work of Lazear and Rosen (1981) on optimal labor contracts. Tournament theory has been applied to a range of diverse activities, including academic achievement, amateur and professional sports, arts, architecture, manual labor, sales, engineering and scienti ic work, and executive promotion.1 Prize-based contests have also been studied as a means to spur innovation (Wright, 1983; Kremer, 1998; Scotchmer, 2004; Terwiesch and Xu, 2008; Kremer and Williams, 2010). These types of contests played an important historical role in driving technological development in a range of industries including agriculture (Brunt, Lerner, and Nicholas, 2011). More recently, large-scale online contest platforms that provide on-going tournament-based work and compensation have emerged in areas such as scienti ic problem-solving, software development, graphic arts design and creative performance, as anticipated by Autor (2001). Today large industrial companies increasingly use online contests as a complement to in-house research and development. Theories of behavior and performance in rank-order tournaments have analyzed the provision of economic incentives, as well as strategic responses on the part of participants when contests are used to elicit effort (Lazear and Rosen, 1981; Taylor, 1995; Prendergast, 1999; Moldovanu and Sela, 2001). One issue in the design of tournaments concerns the effect of the number and skill distribution of competitors on the elicitation of effort and performance outcomes. Empirical research on tournaments has shown that individuals perform less well on average when faced with a larger number of competitors in total and a larger number of superstars. This evidence has come primarily from experimental settings and sporting events. Particularly in ield settings, studies often lack direct evidence about the underlying mechanisms, such as the amount of effort, that might account for these results. In this study, we exploit a novel dataset on algorithmic programming contests that contains data on individual effort, risk taking, and cognitive errors, enabling us to shed greater light on the mechanisms that underlie performance outcomes in the face of increased competition. As a simple example of a one-shot tournament, consider amateur golfers playing a round. Players with similar handicaps, i.e., of equal ability, might prefer to play in a twosome rather than a foursome, because the probability of out-playing a single competitor exceeds that of out-playing three rivals. Additionally, each golfer may exert less effort and therefore perform less well in the foursome, because the lower likelihood of winning reduces the returns to effort. Empirical studies of tournament-type contests have shown that, on average, individuals perform less well when faced with a larger number of competitors, including in retail sales (Casas-Arce and Martı́nezJerez, 2009), research tournaments (Fullerton et al., 2002), and software development (Boudreau, Lacetera, and Lakhani, 2011). Early models of tournaments relied on the assumption that agents had equal ability (Lazear and Rosen, 1981). Subsequent theoretical analysis and empirical research has examined tournaments in which competitors have heterogeneous abilities (Knoeber and Thurman, 1994), including recent work on the effect of completely dominant competitors (Brown, 2011).2 Continuing with the golf analogy, suppose that Tiger Woods replaces one of the members of the foursome-an extreme form of varying the ability of competitors. On his worst day, Tiger Woods will outplay each of the other golfers. In this extreme case, a superstar competitor performs better with zero effort than another competitor performs with maximum effort; the expected rank of performance for each player therefore drops with virtual certainty relative to contests without such a superstar. If contest payoffs are winnertake-all or strongly non-linear with respect to rank, non-superstars have little incentive to exert effort. Evidence of a ``superstar effect'', in which non-superstars perform less well in the presence of a much superior player, has been found in competitive sporting environments such as tennis (Sunde, 2003; Lallemand, Plasman, and Rycx, 2008), and indeed in the case of professional golf in the US (Brown, 2011) and Japan (Tanaka and Ishino, 2012).3 1 Konrad (2009) provides a comprehensive literature review in a range of contest and tournament settings. 2 Szymanski and Valletti (2005) consider the implications of tournaments with unequal ability distributions. Brown (2011) provides a simple formal model of the effects of incentives and strategies created by the presence of a superstar in a professional sports setting. 3 Competitive rank order tournaments may have other disadvantages. Relative to more conventional contracting with a single agent and rewards based on observable outcomes (Holmstrom and Milgrom, 1991), tournaments create redundant costs and efforts by multiple agents on the same task (Che and Gale, 1983; Fullerton and McAfee, 1999). However, by enabling comparisons through relative performance evaluation, tournaments have the advantage of generating more information regarding things such as the dif iculty of the task and the relative skills or efforts of workers in the tournament. And, adding at least a minimum level of competition can stimulate effort by discouraging slack (Taylor, 1995). A large number and 2 Prior research has often attributed the superstar effect to a rational decrease in effort by non-superstars. But in addition, a superstar might affect the behavior and performance of competitors through other channels. Consider this quote from Riley (2012): ``There were a number of years, enough to be called an unmatchable era, when Tiger [Woods] won every single tournament in which he held a third-round lead. Part of this dominance was aided by the thumb-sucking meltdowns of his playing partners; when paired with Tiger, opponents put up notably worse scores than they did with anyone else. And these were great players, major-championship contenders. In the past, guys like Robert Rock would stub themselves right out of contention, their intestines were knotted so tightly. This passage provides a complementary view of the behavioral mechanisms set in motion by intensi ied competition and the associated impact on performance. Riley interprets the reaction of other players to Tiger Woods as a form of ``choking,'' in which psychological pressure, in this case from the presence of a dominant competitor, causes other golfers to perform below their abilities. Research in psychology has established that stress or pressure on individuals to perform well--including time pressure to perform a task, high stakes, the presence of an audience, and social anxiety--causes individuals to perform below their abilities (Baumeister and Showers, 1986; DeCaro et al., 2011).4 Psychological response to competition may take other forms as well. For example, tournament incentives may lead individuals to lower their ``cognitive effort''--also known as mental effort, denoting intensity of mental activity including the degree of voluntary attention or concentration (Kahneman, 1973)--which contrasts with physical or ``labor effort'' (Bracha and Fershtman, 2012). For example, in an experimental study, Bracha and Fershtman (2012) ind that some individuals may work harder in terms of labor effort in a tournament-type contest than under a pay-for-performance reward structure, but cognitive effort may deteriorate, causing worse performance outcomes. Yet other research has suggested that the psychological response to pressure may cause the performance of some competitors to improve rather than decline, particularly for high-ability individuals. In particular, selfcon idence in the task at hand, which may be positively correlated with ability, can counterbalance the debilitating effects of stress in some circumstances (Baumeister and Showers, 1986). In an experimental study involving basketball free throws, Otten (2009) inds that greater ``perceived control'' (self-con idence) has a positive affect on performance under pressure, suggesting a possible explanation for ``clutch'' (better than usual) performance. In addition to the foregoing explanations of reactions to increased competition, both economic and psychological research suggests that the ability of competitors may affect their reactions. For example, economic logic suggests that lower skilled competitors may react less negatively to the presence of a superstar, because a superstar has less impact on the likely rank of these competitors (Brown, 2011). In contrast, high ability competitors on the edge of winning positions may even increase their effort, because they have the most to gain (Casas-Arce and Martı́nez-Jerez, 2009). Psychological research also suggests that the strongest reactions may come from higher ability competitors, because high performance expectations for these competitors, both from others and from the competitors themselves, may create the greatest pressure to perform well (Baumeister and Showers, 1986). Taken as a whole, prior research suggests that in addition to providing evidence regarding the underlying mechanisms that may account for performance outcomes in the face of increased competition, it is useful to examine which competitors react most strongly to increased competition, who they react most strongly to, and whether they react positively or negatively. We analyze these issues using a detailed microeconomic data set from tournament type contests sponsored by an online platform that produces commercial grade enterprise software and algorithmic analytics solutions.5 We obtained data on over 4000 computer programmers who participated in condiverse set of competitors may be especially important in cases in which it is not known ex ante which individual will turn out to be best suited to perform a task (Boudreau, Lacetera, and Lakhani, 2011). 4 A review of the neurobiology literature by Arnsten (2012) covers several recent indings on the interaction of stress and higher brain functions, including the role of stress hormones in causing ``a rapid and dramatic loss of prefrontal cognitive abilities,'' thus impacting an individual's working memory, the short-term memory used during computation. 5 Boudreau, Lacetera, and Lakhani (2011) use the same data set to examine the relationship between an increasing number of contestants and negative incentive effects as well as extreme-value outcomes from the point of view of a contest sponsor. They do not examine individual responses or the role of superstars. 3 tests requiring the creation of software solutions to three algorithmic problems at a time. The data have several desirable features, including random assignment of competitors to virtual rooms in which they compete, multiple observations per individual, a ine-grained measure of ability for individual competitors, and performance outcomes per individual in each contest (namely, a problem-solving score and an indicator of the presence of logical errors). The data structure allows us to estimate the impact of the number and skill distribution of competitors on individual performance, while controlling for programming ability and other individual- and contest-speci ic effects. We also can examine how these effects on individual performance vary by competitor ability. Additionally, the data contain information about the choices and actions of individual competitors, such as the amount of time spent working on a problem. These data provide evidence regarding underlying mechanisms--such as effort, risk taking, and cognitive errors--that may account for performance outcomes in the face of increased competition. We begin by documenting performance outcomes in response to increased competition. As part of this analysis, we bring together two previously separate research streams: one that has focused on the impact of the total number of competitors on performance outcomes, and another that has focused on the impact of superstar competitors. We irst seek to replicate the result that contest participants have worse performance outcomes on average when faced with a larger total number of competitors, or the ``N-effect'' for short.6 Consistent with prior research, our results show that a larger number of competitors leads to worse performance outcomes on average. We then add the presence of superstars to the analysis, and ind that this has an additional negative effect on performance outcomes. Notably, both effects hold in the same setting and data. We then decompose the total number of competitors into superstars and non-superstars, and ind that on average participants react much more negatively to an increase in the number of superstars than to an increase in the number of non-superstars. We also ind that these negative effects are strongest in a particular subgroup of competitors: those that are highly skilled, but whose abilities put them near to the top rather than at the top of the ability distribution. In addition, we ind that a small group of competitors, who are at the very top of the ability distribution (excluding superstars), react positively to increased competition from superstars in particular. Then we turn to an exploration of the underlying causes of these effects, particularly in higher ability competitors that react most strongly to increased competition. For the near-to-the-top competitors that react most negatively, we ind no evidence that increased competition affects observable actions and strategic choices. For example, an increase in the number of superstar and non-superstar competitors does not lead to either lower observable effort or greater risk-taking. Instead, these competitors make more errors of logic in response to greater numbers of superstars and non-superstars, which points to a cognitive explanation for the decline in performance. In addition, for the small group of competitors at the very top of the ability distribution, we ind some evidence of increased effort and no increase in errors of logic, consistent with both an economic argument that competitors on the edge of winning may exert maximum effort and psychological theories of self-con idence under pressure. The paper proceeds as follows. section 2 describes the empirical context in detail. section 3 describes the data set, variables, and estimation approach. section 4 through section 7 present results and analysis. section 8 concludes. 2 Empirical Context: Algorithm Contests at TopCoder Inc. Data for our study comes from TopCoder, Inc., a web-based platform that delivers outsourced software solutions for its clients through the use of online rank-order tournaments involving a member base of over 400,000 registered software developers (often referred to as ``coders''). Established in 2001, TopCoder works with large information technology intensive organizations (e.g., United Technologies, UBS, ING, IMS Health) to identify their software requirements, which it converts into contests for its member base. These contests are open to all software developers registered on the site and last several days or weeks; prize pools vary from $500 to more than $50,000 depending on the contest. Since its founding, the irm has transferred more than $35 million in prize money and peer review fees to its members by conducting more than 10,000 tournaments for over 200 clients. 6 This term was coined by Garcia and Tor (2009) in a study of students taking an SAT test in the presence of greater numbers of test-takers. We use the term in the context of tournaments. 4 TopCoder's value proposition to its clients is the availability of a highly talented pool of software developers that are interested in competing to provide client software solutions. The clearest signal of TopCoder's ability to deliver on the availability of talented software developers is the number of members that have received a skills rating, which measures programming ability. Currently almost 50,000 members have received a TopCoder skills rating, the vast majority through a series of ongoing algorithmic contests called single round match (SRM), which occur online on a weekly to bi-weekly basis. Beyond providing a signal of platform capability to corporate clients, these contests serve to recruit members by creating a tournament environment where developers can demonstrate their skills against a global pool of competitors. In this study, we analyze participation and performance data from these algorithm contests. TopCoder gave us access to the algorithm contest data (2001-2007), and we conducted extensive interviews with the irm's executive team, its clients, and 20 elite members to improve our understanding of the contest setting. 2.1 Contest Structure The main task in an algorithm contest is to solve three computer science algorithm problems in 75 minutes. The problems are ``synthetic'', i.e., a TopCoder employee or member creates the problems in order to challenge the competitors and derive their skills ratings. Mike Lydon, chief technology of icer for TopCoder and the principal designer of the contest framework, explains: ``The problems we pose in the SRM contest are quite demanding and require more than the average amount of mathematical and computer science knowledge along with creativity in exploiting and developing various algorithmic approaches and the ability to translate an abstract problem statement into functional code in 75 minutes or less. To enable consistent testing and ratings of our member base, we design our problems so that we can assess skills in areas as diverse as computational biology, graph theory, image analysis and feature extraction, text mining and semantic analysis, and graphical rendering, amongst many more. Algorithm contests are held on different days of the week and times of day to encourage participation by TopCoder's global membership. TopCoder advertises contest dates and times well in advance through personalized emails and website announcements. Registration to participate in a contest begins three hours before the start time and closes ive minutes before the contest starts. During this period, registered contestants can choose to wait in a virtual chat room, engage in banter with other contestants, and browse information about other registered competitors to get an idea of the skill levels and numbers of those competing in the event. An imprecise measure of skill is readily available in that each coder is listed by handle7 , along with a color code that provides a rough indicator of the coder's programming ability, based on TopCoder's skills ratings (described in more detail in the next subsection). Participants can obtain more detailed information by looking up another coder's public pro ile on TopCoder, which prominently displays a coder's skills rating and the percentile in which this rating places the coder relative to others ). Competition occurs in two divisions, I and II, based on participants' skills ratings. Division I consists of more experienced individuals that have higher ratings of 1,200 or above. Division II consists of novices, unrated, and lower-skilled software developers. Developers compete only against others in their division. Our empirical analysis focuses on Division I. This ensures that our analysis of the impact of competition on performance is not inluenced by novices learning how the contests work, individuals casually trying out the platform, or low skilled developers who may have dif iculty in the competitions regardless of the number and skill distribution of competitors. All registered competitors who have logged into the TopCoder platform, typically in the hundreds, are placed into virtual competition rooms of around 20 competitors. TopCoder places competitors in rooms with others in the same division; room placement within divisions is random. TopCoder's decision to divide registrants into competition rooms of approximately 20 was driven by early feedback from members, who noted that too large a 7 A competitor's handle is the unique pseudonym used to identify the competitor on the TopCoder platform. Most online communities use a similar system. 5 competition room was intimidating and discouraging. TopCoder also discovered that keeping the room size relatively small kept incentives high, because more contestants had a chance to win and place. Importantly, TopCoder distributes the prize purse (if any) based on scoring well within a room. Prior to the start of the contest, a coder does not know which other contest registrants will be in his (or her) competition room. Once the contest begins, a coder has access to a sophisticated ``heads up'' display that provides information about the skill level of each competitor in the room, access to competitors' pro iles, a real-time update that shows which competitors have submitted solutions to which problems, and a live scoreboard showing provisional points awarded per problem to each competitor. The contest format tests a software developer's ability to write code that accurately solves each problem while at the same time rewarding programming speed. Each contest has one easy problem, one problem of medium dif iculty, and one hard problem. The point value of each problem indicates its dif iculty. The most common distribution of point values is 250, 500, and 1000 for the easy, medium, and hard problems, respectively, although points per problem differ across contests due to differences in problem dif iculty, as does the total possible number of points per contest. Participants have no information about the problems or point values until the contest of icially begins. Once the contest starts, participants see the point values for each of the three problems. The participants can then choose to ``open'' any problem in any order and to submit solutions to problems in any order as well. However, as soon as a coder opens a problem by clicking on the problem statement link, the points available to that contestant for the problem start to decline until the individual submits a solution for analysis and testing. Typically, competitors open the three problems in order of dif iculty, from easiest to most dif icult. If a contestant opens more than one problem simultaneously, the scores for all of the problems start to decline in the same manner until submissions are made. In order for a submission to be accepted for further evaluation, the submission must compile and have the proper structure to accept input and produce output.8 Thus, accepted submissions contain no syntax errors or incorrect function names. Once a coder's submission has been accepted, he receives provisional points based on the length of time to submission. If a coder opens a problem but does not submit a solution, the coder receives zero points for that problem. After the 75 minute coding phase, a 15 minute challenge phase ensues. During this period, a coder may challenge the correctness of accepted submissions. For each challenge, a coder submits a test case (an input or set of inputs to the program) for which an accepted submission by another competitor will produce incorrect output.9 If a challenge is accurate, the challenger receives 50 points while the competitor loses all points for the submission. If the challenge is inaccurate, the challenger loses 25 points. On average, individual competitors issue challenges in about one-third of contests, typically issuing one or two challenges when they do. After the challenge phase, the contest moves into an automated testing phase. TopCoder subjects each submission to an automated barrage of test cases and corner conditions to ascertain whether the submission contains logical errors. If a submission fails even one test case, the automated system immediately removes all points provisionally awarded for that submission. Because incorrect submissions receive zero points, in deciding how much time to spend working on a problem before submitting a solution, a coder must balance the reduction in points as time spent working increases against the possibility that more time spent working could increase the likelihood of a correct submission. To calculate total points per contest for each coder, TopCoder sums the number of points for all correct submissions as well as challenge points earned. 2.2 Rating System TopCoder assigns a skills rating to each of its members based on points earned in algorithm contests. The rating is an integer value that is updated at the end of each contest in which a member participates. the syntactical functionality of a software code is similar to the ``hello, world'' test conducted by novice programmers. See http: //en.wikipedia.org/wiki/Hello_world_program. 9 If a challenge is correct, TopCoder incorporates it into the suite of test cases used in the subsequent automated testing phase. 8 Checking 6 TopCoder uses an Elo type of rating system based on rank order performance (in terms of total points earned per contest), similar to the system used in chess and many sports. In this widely-used type of rating system, the underlying model of contest performance is not strategic; individual performance is presumed to depend only on ability relative to other competitors and random noise. The system is designed to uncover a competitor's true ability when contest performance is a noisy signal of ability. Here we brie ly describe the TopCoder rating system. Appendix A provides a fuller explanation of the system. To calculate a coder's rating at the end of each contest, the rating system irst converts the total points scored by each coder into an integer rank in the contest. The system also generates a predicted rank for each competitor, based on a comparison of his pre-contest rating with the pre-contest ratings of other competitors. Both a competitor's actual and predicted rank are adjusted for the number of competitors in the contest, and predicted rank is also adjusted for the variability of a coder's prior ratings; these adjusted ranks are converted to their values in an inverse standard normal distribution. Importantly, a coder's rating depends on where his rank in the contest lies in comparison to his predicted rank based on past contests. For example, if a coder's past history predicts that he will have the 10th highest score in a contest, his rating will decrease if he places 11th. Conversely, his rating will increase if he places 9th. Thus, the system ensures that a less highly rated competitor will not have his rating fall simply because he scores below more highly rated competitors. In fact, a lower-rated competitor needs the presence of higher-rated competitors in order to have a chance of improving his rating. The rating system is designed so that a coder's pre-contest rating is the primary determinant of the post-contest rating. The system attaches less weight to the difference between actual and predicted rank for coders with more contest experience. In addition, a coder's post-contest rating cannot exceed his pre-contest rating by more than a set value, which is an inverse function of the number of times that a coder has been rated. These features insure that the ratings of coders with more contest experience change less over time than do the ratings of less experienced coders. 2.3 Incentives and Motivations Our interviews with TopCoder executives and elite members suggest a number of motivations for participating in the algorithm contests. The most obvious relates to the potential to win prize money. In the early days of the platform, TopCoder grew its member base by offering money to the ``Top Coders'' who were able to win contests. This established the TopCoder platform and its rating system as a legitimate avenue for both testing and recording skills in the software community. After TopCoder had established a critical mass of developers, the company gradually withdrew monetary awards for participating in algorithm contests and instead provided prize money largely through client-speci ic contests. In total, TopCoder has distributed over $1 million in prize money to its algorithm contest participants. In all, about 20 percent of the contests in our data have a monetary reward, with an average of approximately $1,300 in prize money per contest. Prizes are awarded by room, usually to the top two competitors; coders that win a prize receive an average of $51. Another prominent reason to compete in TopCoder algorithm contests is to receive a TopCoder rating. Because the rating measures programming ability, a number of software development irms request that job applicants get a TopCoder rating before applying. Some irms (e.g., Google, Facebook, Microsoft) sponsor algorithm contests to signal the importance they place on the TopCoder rating. In fact, TopCoder receives more referrals to its website from Google's web page for job applicants than from any other source. One TopCoder member noted that obtaining a high rating is non-trivial -- requiring familiarity with many types of algorithms, quick thinking, strong work ethic, attention to detail, and overall ``smarts''. This member also noted that some of his friends had obtained jobs at high pro ile technology irms because of their TopCoder rating. Hence, the rating possesses value through improved job prospects and earnings. In addition to obtaining a TopCoder rating, many developers compete in algorithm contests in order to learn and become better programmers. Developers gain experience during the competition itself, and can learn from other participants in post-competition discussion. TopCoder makes submitted solutions available for competitors to view, and members typically discuss and dissect the solutions after a competition. After each contest, members create a full commentary on each problem and the various solution approaches used in submissions. Members 7 across the entire skills distribution note the advantage of learning through competitions. One highly-rated member, who prominently has displayed his TopCoder rating on his CV, notes on his resume page: ``I regularly participate in and organize a number of programming comptetitions [sic]. As a result, I am familiar with a vast number of algorithmic problems from all areas of computer science. Among the thousands of people who participate in these competitions, I am consistently ranked in the top 50 worldwide. Frankly, I believe that these competitions have taught me more about computer science and programming than all of the university courses. (Ivor Naverinouk, engineer at Google) Finally, TopCoder members and executives report that the competition format is itself motivating. One member stated (Lakhani, Garvin, and Lonstein, 2010): ``To be successful at TopCoder, you must ask yourself, `Are you a competitor?' You need to be able to thrive on competition; you can't be scared of it.'' TopCoder's founder and chairman, Jack Hughes (2012) remarked that: ``Competition - in games, sports, intellectual exercises (chess, science and math) - is motivating because it is at the core of how we improve. It is dif icult to improve at something unless you measure it. Once you measure, you are competing - even if only with yourself - against the clock or to get a better grade for instance. There is real value in simply trying to do something.'' Much as many people join informal, pick-up games of football or basketball, developers enjoy the competition in TopCoder events, especially algorithm contests, as a way to exercise and demonstrate their software skills. The strength of these various incentives and motivations varies by contest and competitor. Most algorithm contests, for example, do not offer monetary prizes. In addition, as noted above, pre-contest ratings are the primary determinant of post-contest ratings for experienced participants, who make up the majority of competitors in the algorithm contests. Certainly, sustained increases or decreases in inal points over multiple contests will change an individual's rating. But for most competitors, the rating system insures that points earned in a single contest do not have a large impact their ratings. As a result, ratings provide a strong incentive mainly for new members and for those with little prior experience. Thus, for many coders, monetary rewards and ratings do not explain why individuals choose to compete in these contests. Rather, opportunities to learn, as well as a desire to exercise programming skills and demonstrate them to others, appear to be strong motivating factors. 3 Dataset, Variables, and Estimation Strategy Our data analysis focuses on the years 2004-2007. In the years prior to this, TopCoder executives experimented with contest formats and room assignment algorithms before settling on a stable approach to the algorithm contests. This dataset is an unbalanced panel, as not all TopCoder members compete in every contest. As noted earlier, we focus on Division I, which comprises the more highly skilled software developers. Our data contain records on over 4000 participants active in 181 events. To assess the impact on performance outcomes of the number and skill distribution of competitors, we analyze competition at the room level. During a competition, the information that coders receive about their competitors occurs at the room level. As noted earlier, TopCoder provides a vivid representation of the number and abilities of competitors in a room, and an up-to-date scoreboard that indicates submissions by each competitor in the room, as well as provisional points awarded for each submission. In addition, the distribution of prizes (if any) and post-coding challenges occur within rooms. 3.1 Outcome Variables As noted earlier, TopCoder measures the problem-solving effectiveness of a participant by summing the total points earned at the end of a contest, denoted by the variable inal points. This is the dependent variable in our initial analysis of the effect of increased competition on performance outcomes. In a subsidiary analysis, we decompose inal points into those for the irst (easy), second (medium dif iculty), and third (hard) problems, and the challenge phase. Final points depend heavily on the amount of time spent working on problems, the correctness of submissions, and points earned during the challenge phase. We use these as dependent variables in an analysis of the mechanisms that underlie the performance outcomes that we observe. The three variables are: minutes spent working, an observable measure of effort per problem: challenge issued, a dummy variable indicating whether a 8 coder issued a challenge, another observable measure of effort; and incorrect submission, a dummy variable that indicates whether a submission failed a challenge or system test due to a mistake in coding caused by a logical error. 3.2 Explanatory Variables 3.2.1 Number of Competitors in the Room The number of competitors in a virtual contest room, number in room, is our irst explanatory variable. Figure 1 illustrates the variation in the number of competitors in our data. The mean number of competitors per room is 18.6, with 85 percent of rooms having 17-20 competitors, providing a 15-20 percent variation in the number of competitors. [Figure 1 about here.] Two factors lead to variation in the number of competitors in a contest room: 1) TopCoder's room allocation system, and 2) coders that register for a contest but do not compete. As coders register for a contest, TopCoder aims to ill each virtual competition room with 20 contestants. Participants do not arrive in even groups of 20, however, causing a mathematical indivisibility problem for the room allocation system. The system deals with this problem by trying to create rooms that have roughly the same number of competitors, with no large imbalances between rooms. In addition, during the three-hour registration period, some members who have signed up may decide not to compete. If some registered contestants do not show up at the start of the contest, it affects the number of competitors in a room, because it is too late for the room allocation system to adjust for their absence. As noted earlier, coders do not receive their room assignments until the contest begins. Although the room allocation system assigns ``no-show'' members to a room during the registration period, other competitors have no knowledge of their presence. No-show members also do not know the identity of the competitors in their room or the nature of the problems before a contest begins. While we cannot directly observe the reasons why members do not show up after registering, investigation of the data indicates that average room sizes are lower on weekdays than on weekends (holding total contest participation constant), suggesting that coders ind it harder to predict their schedules during weekdays than on weekends. Our interviews with participants revealed that reasons for not showing up included getting caught up in work activities, social pressures such as friends and spouses needing their attention, and miscalculating when a TV show would be on (e.g., Battlestar Galactica). Hence, the factors leading to variation in the number of competitors in a room appear to be exogenous to the contests themselves. Nevertheless, in the empirical analysis, we control for the total number of competitors per contest, day of the week, and month (e.g., coders may have greater amounts of vacation time during certain months). 3.2.2 Identifying Dominant Competitors (``Superstars'') Highly dominant competitors make it dif icult for others to rank at the top in a contest. Brown (2011), for example, identi ies a single individual -- Tiger Woods (when he was highly dominant) -- as a superstar, and uses his presence or absence in a tournament to identify a superstar effect. In our context, we identify a set of overwhelmingly dominant competitors whose abilities indicate that they are likely to score well above other competitors in the room. The number of these ``superstars'' in a room is our second explanatory variable. The procedure used to identify superstars begins by generating a predicted score ( inal points) for each competitor in each contest. To do this, for each year in our sample (2004-2007), we irst estimate the following model, using data from all contests in the previous three years: (1) = + Rating + Rating + • : inal points for coder in contest • Rating : pre-contest rating + 9 • Rating : pre-contest rating squared • : contest-speci ic controls Coder ability, measured by a coder's pre-contest TopCoder rating, is likely to be a key determinant of inal points. In addition, we include several control variables. The squared ratings help control for non-linearity in pre-contest ratings.10 Contest-speci ic controls, which improve the precision of the model, include the number of competitors in the contest, the point value of each problem, an indicator of whether prize money was available, year of the contest, and month and day of the week dummy variables. For each year in the sample, to generate predicted scores for each competitor in a given contest, E , the estimated coef icients from equation 1 are applied to data for each coder in that contest. The standard deviations of these itted values are stored for each observation. These vary for each observation based on the particular values of the covariates associated with that particular coder and contest. We then generate a predicted performance interval for the inal score of each competitor in the contest, as follows: [E − 4 × .. , E + 4 × .. ] The prediction interval for the inal score of each competitor in the contest ranges from four standard deviations below the point estimate to four standard deviations above the point estimate, encompassing a relatively large range of potential performance per competitor. These prediction intervals are intended to capture coders' ``competitive proximity'' to other contest participants. Two competitors whose intervals overlap could reasonably expect to place near one another in a contest and are considered close, as illustrated in Figure 2. [Figure 2 about here.] Competitors A and B may also be indirectly proximate if A's interval overlaps that of another competitor whose interval overlaps that of B. In order to account for this situation (and further indirection), we take the union of all prediction intervals, as shown in Figure 2. All competitors in a union of predicted intervals are considered part of the same competitive group. Figure 3 illustrates the process of assigning competitors to competitive groups for an actual room in our dataset. Group numbers are assigned from the lowest expected scores to the highest, starting at 1. If a room contains at least two groups, superstar competitors are those in the highest competitive group. Any individual not in the superstar group is considered a non-superstar. [Figure 3 about here.] This procedure for identifying superstar competitors labels 541 competitors as superstars in at least one contest in our sample. The average room has 1.31 superstar competitors, with a median of 1. Ninety-three percent of rooms have 3 or fewer superstar competitors; 13 percent of rooms have only one competitive group and thus have no superstars. On average, a competition room has 2.93 groups, with a median of 3 groups per room. Figure 4 and Figure 5 display histograms of the number of superstars per room and the number of groups per room, respectively. [Figure 4 about here.] [Figure 5 about here.] In order to simplify presentation of the results, our analysis includes only observations for non-superstar competitors. All results hold when using the full dataset; there are no changes in levels of signi icance and point estimates are very close in magnitude to those presented here. 10 This turns out to have little impact on predicted scores. We omit it in our primary speci ication in the main part of our analysis of the effects of increased competition. 10 3.3 Estimation Approach We seek to estimate the impact on individual performance of increased competition in the contest overall and from competitors of different types, especially highly dominant ones. We estimate a linear model in which the performance outcome for each competitor in each contest depends linearly on numbers of competitors in total and of different types, as well as on other separable factors such as coder ability and speci ic features of the contest environment. We also include individual competitor (coder) ixed effects; the model thus exploits performance variation within individuals across contests. In principle, random assignment of competitors to rooms implies that unobserved coder characteristics do not vary systematically with our explanatory variables of interest. Nevertheless, we include individual competitor ixed effects in order to preclude bias due to unobserved heterogeneity of coders. Our primary speci ication is a linear model with competitor ixed effects that regresses inal points for each competitor in each contest, , on measures of the number of competitors in the room (total, superstar, and/or non superstar), as described in the next section; the speci ication also controls for coder ability and speci ic features of each contest. For purposes of illustration, in Equation 2 below, denotes the total number of competitors in a room faced by competitor in contest . To control for ability, we use a coder's pre-contest TopCoder rating. We also control for features of the contest environment: the point value of each of the three problems in a contest, the number of competitors in the entire contest, a dummy variable indicating whether prize money was offered in the contest, the year of the contest (2004-2007), and dummy variables for day of the week (Saturday omitted) and month (June omitted). The variable indicates individual competitor ixed effects. (2) = + + + + + Because competitors self-select into the TopCoder member base and into each contest, this could bias our estimated coef icients. In particular, self-selection might affect the distribution of coder ability in a contest. If a particular contest draws a larger proportion of highly skilled competitors than usual, although this would not cause the total number of competitors per room to increase, it could lead to a larger number of superstar competitors in a room on average (since TopCoder assigns competitors randomly to rooms). Examination of the data reveals no difference in the distribution of coder ability per contest associated with month or day of the week. Indeed, the only factor that appears to be correlated with the distribution of coder ability per contest is prize money: contests that offer prize money tend to have a larger number of total competitors because these contests draw a larger number of high ability coders. The inclusion in Equation 2 of the indicator for prize money helps to control for possible sample selection bias from this source. This variable also helps to control for any shift in competitor behavior due to the potential to obtain a monetary reward. Table 11 in B provides descriptive statistics for all variables used in our analyses. 4 The Effects of Competition In what follows, we use the speci ication in Equation 2 to irst estimate the impact of the total number of competitors on individual performance, so as to assess whether an N-effect holds. We then add the total number of superstar competitors to the analysis. In addition, we decompose the total number of competitors into the number of superstars and the number of non-superstars. Then we conduct more ine-grained analysis to assess whether the effects of increased competition vary with the ability of competitors, and to further ascertain which competitors may be driving our overall results. We also conduct robustness tests of this analysis using alternate speci ications, before examining mechanisms that may underlie our results. 4.1 The N-Effect As noted earlier, tournament theory implies that an increase in the number of competitors lowers individual effort. In order to check for the presence of an N-effect, we regress inal points on the total number of competitors in 11 the room ( ), using the ixed-effects speci ication in Equation 2. Table 1 reports the results. As predicted by theory and consistent with prior empirical studies, we ind a signi icant but small negative effect on performance outcomes as the number of competitors in the room increases. On average, inal points earned by a competitor fall by about 2 points for each additional competitor in the room. The magnitude of the effect is similar to that seen in earlier work with this dataset Boudreau, Lacetera, and Lakhani (2011).11 [Table 1 about here.] 4.2 The Superstar Effect In tournaments, theory suggests that the presence of competitors of signi icantly higher ability (superstars) will reduce the efforts of those with lower ability (Szymanski and Valletti, 2005; Brown, 2011). We therefore extend the speci ication in Equation 2 by including the number of superstar competitors in a room faced by competitor in contest ( ).12 (3) = + + + + + + This speci ication is motivated by Brown's (2011) study, which compared performance in golf tournaments with and without Tiger Woods (the superstar). In golf, the total number of competitors per tournament is always the same and the number of superstars switches between zero and one, depending on whether or not Tiger Woods competes. In a similar spirit, we estimate the impact of a larger or smaller number of superstars holding the total number of competitors constant. Thus, the superstar effect in Equation 3 can be interpreted as the impact of a competitor switching from non-dominant to dominant, holding the total number of competitors per room constant. In this speci ication, consistent with empirical evidence in Brown (2011) and other studies, replacing a non-superstar with a superstar in the room leads to a reduction in inal points earned by each competitor of about 2.5 points, as shown in Table 2. [Table 2 about here.] Unlike in golf and other sporting events, the total number of competitors in TopCoder tournaments varies. Thus, the total number of competitors per room may increase due to an increase in the number of superstars, an increase in the number of non-superstars, or both. Moreover, the overall effect of increased competition may depend on which types of competitors cause the total number of competitors to vary. A variant of Equation 3 captures these effects by decomposing the total number of competitors into the number of superstars ( ) and non-superstars ( ): (4) = + + + + + + The speci ication in Equation 4 provides an estimate of the effect of adding a superstar to the room while holding the number of non-superstars constant, and vice versa, which is a natural interpretation of the effects of increased competition in the TopCoder context. In contrast, Equation 2 constrains the effect of adding one more superstar to be the same as the effect of adding one more non-superstar, which presumes that the impact of adding a competitor to the room is the same regardless of that individual's ability. In Equation 3, the estimated coef icient on the number of superstars also does not provide an accurate estimate of the full effect of adding one more superstar to the room: the estimated coef icient on N re lects the effect of adding one more superstar when it is constrained to equal that for a non-superstar, and the estimated coef icient on SS re lects any additional effect of one more superstar beyond that captured by the estimated coef icient on N. Although Equation 3 and Equation 4 are similar, Equation 4 more easily provides estimates of the full superstar effect, and more importantly, makes 11 Boudreau, Lacetera, and Lakhani (2011) ind a somewhat higher effect of approximately 5 points per competitor, but at the room-problem level rather than the individual-contest level and under a different speci ication. 12 In our dataset, the number of superstar competitors per room is not particularly correlated with the total number of competitors per room, with a coef icient of correlation of 0.02. 12 it easy to tell which types of competitors account for the overall effect of increased competition on performance. Therefore, we base our subsequent analyses on the speci ication in Equation 4.13 As shown in Table 3, for this speci ication, adding a superstar to the room leads to a reduction in inal points per competitor of 5.7, and adding a non-superstar leads to a reduction in inal points of 2.6. [Table 3 about here.] In summary, consistent with previous work, we ind evidence of small but signi icant performance losses as the total number of competitors increases. We also ind that increased competition from both superstars and nonsuperstars has a negative effect on performance, but the superstar effect is substantially greater. This provides new evidence that at least in some settings, the overall effect of increased competition may in fact re lect a substantial superstar effect. 5 Heterogeniety in the Response to Competition The effects just identi ied apply to the average participant in a contest. However, prior research suggests that the reaction to competition may vary with the ability of competitors. For example, Brown (2011) found that in PGA golf tournaments, Tiger Woods had an effect on the top half of the competitor ield in terms of ability, but not the lower half. Figure 6 depicts the impact of increased competition for each TopCoder rating in our dataset. To generate this igure, we use a two-stage estimation process. In the irst stage, we regress inal points against our control variables, using a linear speci ication with individual competitor ixed-effects (that is, Equation 4 without the number of superstars and non-superstars ).14 In the second stage, we use a locally-weighted OLS speci ication to regress the irst-stage residuals (excluding ixed effects) against our measures of competition, namely, the number of nonsuperstars and the number of superstars in the room. The second-stage regression is run separately for every rating in the dataset, from 1200 to 3754. The regression uses a triangular kernel to weight the observations at each rating; a bandwidth of 300 rating points was used to generate the plots (see Yatchew (1998) and Greene (2003) for additional discussion of the choice of kernel and bandwidth ).15 The coef icient estimates and standard errors for our competition measures from each regression (on the vertical axis) are then plotted against each rating (on the horizontal axis) in Figure 6. [Figure 6 about here.] The plots in Figure 6 indicate large negative effects for competitors in approximately the 83rd to 94th percentiles of the ability distribution. For competitors with lower ratings (below about 2050), the negative effects are much smaller. At the far right of the plots, competitors with very high ratings show some evidence of an increase in score as the number of superstars and non-superstars increases. However, there are few competitors in this range and standard errors grow quite large. 13 Note that because , the estimated coef icient on in Equation 4 is essentially the same as the estimated coef icient on in Equation 3. In Equation 3, ( ) ( ) . 14 In estimating the relationship between ratings and competition effects, we tried three additional approaches: dividing ratings into 10 regions with indicator variables, one-stage kernel estimates using OLS, and the two-stage differenced approach used in Yatchew (1998). All show a similar pattern of behavior. In the two-stage approach that we present, the irst stage estimates may be biased if the competition effects are correlated with our control variables (see Yatchew (1998) for a discussion). However, one-stage OLS and the differenced approach do not allow for the inclusion of competitor ixed-effects. As the irst-stage coef icient estimates with our control variables were similar to the estimates obtained from the differenced approach, we felt that controlling for competitor heterogeneity in the irst stage was the best option. 15 Given an in inite supply of data, the most accurate approach would be to obtain coef icient estimates for each rating in the dataset using only data for competitors with that rating. Data limitations make this infeasible. Therefore, to obtain the coef icient estimates at each rating, we include data on ``nearby'' competitors whose ratings place them within an estimation window that is 300 rating points wide, termed the ``bandwidth'', centered on the rating for which the estimates are to be generated. A relatively wide bandwidth incorporates more data and tends to result in a smoother estimated relationship, but too large a bandwidth may include observations that are not relevant for the rating in question. We tried several bandwidths from 50 to 300 rating points, and all showed a similar relationship. In addition to the choice of bandwidth, which determines which observations are included in the estimation at each rating in the dataset, the observations within the estimation window can be weighted so that observations closer to the rating in question are given more weight than those toward the edges of the estimation window. Data points closer to the rating in question are more informative about the relationship being estimated. The choice of kernel determines this weighting and provides another tool to smooth the estimated relationship. Our plot uses a triangular kernel so the weight falls linearly with the distance from the center of the estimation window. In practice, the choice of kernel is rarely crucial in obtaining estimates (Greene, 2003). 13 Closer examination of the data underlying the plots in Figure 6 enables us to de ine a ``performance deterioration zone'' (PDZ) that encompasses the strongest negative reactions to increased competition. This zone, with ratings in the range of 2051 to 2414, encloses the largest magnitude of negative effects seen in the kernel estimates, as depicted in Figure 6. Although the boundaries of this range are somewhat blurred due to the use of kernel-based methods, the analysis reported next is not sensitive to the precise location of the upper and lower boundaries of the PDZ. To further investigate the sources of these effects of increased competition, we decompose the inal points of each competitor into points earned for each of the three problems and points earned through challenges. Figure 7 depicts plots for points earned on each problem and through challenges, using the same kernel-based technique described earlier. The response seen in total points is mechanically the sum of the responses for these four subscores. As shown in Figure 7, most of the response to increased competition revolves around the third problem. The points earned from problems 1 (easy) and 2 (medium dif iculty) and from challenges show little response to competition across the ability distribution. [Figure 7 about here.] The information from the kernel analysis enables us to estimate regressions that more precisely assess how competitors of different levels of ability react to increased competition. We use indicators of whether a competitor is below, within, or above the PDZ, and interact each indicator with the number of superstars and the number of non-superstars in the room. Table 4 presents estimates using the new speci ication with inal points as the dependent variable, as well as with points earned for each of the problems ( irst, second, and third) and the challenge score as dependent variables. [Table 4 about here.] As anticipated, competitors in the PDZ show signi icant negative effects of increased competition on inal points, which drop by about 7.5 points per additional non-superstar in the room and by about 20 points for each additional superstar. In addition, competitors whose ratings are above the PDZ show a decrease in inal points for each additional non-superstar. These competitors also show an increase in points for each additional superstar, a result to which we return later. For coders below the PDZ, Table 4 shows that these coders react negatively, albeit less strongly, to increased competition. Like the more skilled competitors in the PDZ, lower ability coders react more negatively to the number of superstars than to the number of non-superstars. This reaction centers on problems 1 and 2, most likely because less skilled competitors concentrate on the less dif icult problems. Conversely, more highly skilled coders within and above the PDZ have a signi icant reaction to increased competition only for problem 3, the most dif icult problem. Indeed, we see no signi icant reactions of the more highly skilled coders for the less dif icult problems, which are likely to prove less demanding for these coders and therefore may elicit less of a reaction to increased competition. In section 7, we examine underlying mechanisms that may drive these performance outcomes. We focus on the third problem, which accounts for a large portion of the effect on performance, and on the most affected group of coders in the PDZ and above the PDZ, as doing so provides the most statistical power for identifying the underlying mechanisms. Narrowing the focus to a single problem reduces the dimensions of competitor actions considerably; essentially, we no longer need to consider all actions in triplicate. However, we emphasize that this selection of focus is driven by the dataset, and may not generalize to other contexts. 6 Robustness Before proceeding to an examination of mechanisms that may underpin the effects of increased competition, we conduct robustness tests of our primary competitor ixed effects speci ication. In particular, we examine alternate approaches to controlling for heterogeneity across coders and contests, using the speci ication just reported in Table 4 for the third problem as the basis for comparison. 14 Table 5 reports the results of these robustness tests. Column one reports coef icient estimates from an ordinaryleast-squares (OLS) speci ication with no controls. The speci ication in column two accounts for heterogeneity through the addition of individual competitor ixed effects but without controls for competitor ability or contest features. Notably, in both alternate speci ications, the coef icient estimates and signi icance levels are similar to those in our primary speci ication, shown in column three for ease of comparison. This suggests that the results reported earlier are highly robust, even without controlling for competitor ability, competitor ixed effects, or speci ic contest features. In addition, as shown in column ive, the inclusion of the number of previous contests that a competitor has entered, which accounts for any potential impact of contest experience on performance, has little impact on the results; the estimated coef icients and signi icance levels are almost identical to those in the primary speci ication. [Table 5 about here.] As an alternate method of controlling for heterogeneity in contests, we replace the contest controls with contest ixed effects, omitting competitor ixed effects. (Including ixed effects can account for heterogeneity in competitors or contests, but not both simultaneously.) Using contest ixed effects shifts the identi ication from within competitors, across contests to within contests, across competitors. As shown in column four, the results seen in the primary speci ication for competitors in the PDZ and above the PDZ continue to hold. The coef icient on non-superstars for competitors below the PDZ becomes signi icant and has a positive sign, however. This may be an artifact of the room allocation procedure described earlier, which does not adjust for no-shows (coders who register for a contest but do not compete). No-shows are likely to be concentrated among non-superstars, simply because superstars are rare by de inition; this in turn will drive variation in the number of non-superstars in the room. Having fewer non-superstars due to no-shows especially bene its lower ability contestants, who are below the PDZ, leading to the positive coef icient. In addition to robustness tests of our primary speci ication, we conduct robustness tests involving the superstar effect. First, we replace the variable for the number of superstars with dummy variables indicating whether a room has one, two, three, four, or ive or more superstars, and interact each of these dummy variables with the indicators for coders below the PDZ, in the PDZ, and above the PDZ. The results, shown in Table 12 in B, are generally consistent with those reported earlier, although they exhibit some variability due to lower degrees of freedom for each interaction term. Overall, the results suggest that the reaction to intensi ied competition tends to increase as the number of superstars increases. One concern with the procedure for identifying superstar competitors is that it may create a mechanical effect on performance. As the number of superstar competitors increases and more of the high ability competitors are labeled superstars, the average ability of non-superstar competitors may fall, leading to a negative effect on performance. In order to check that such a selection effect is not driving the results, we run our analysis on a restricted set of rooms in which the number of dominant competitors is less than k (where k is either 3 or 4), and include in the analysis only competitors predicted to have a rank of more than k-1 (where a higher numerical rank indicates placing less well). For this population, the top k-1 competitors are never included in the analysis, even if they are not superstars; the analysis is run with N-k-1 competitors per room, insuring that the average ability of non-dominant competitors varies only with N and not with the number of superstars. The results, shown in Table 6, are consistent with the results from the full sample.16 [Table 6 about here.] 7 Mechanisms and Competitor Actions As our estimates of the effects of increased competition from both superstars and non-superstars appear robust, we continue the analysis by examining underlying mechanisms that may account for these results. In what follows, we continue to focus on points earned on the third problem. 16 In order to improve precision of the estimates for the control variables, all competitors are included in the analysis. For each regression on the restricted set of rooms with less than superstars, only competitors in those rooms are used to estimate the interaction terms shown in Table 6, by interacting these terms with a dummy variable indicating whether an observation comes from a room with less than superstars. 15 To begin, Figure 8 displays kernel estimates of the impact of the number of superstars and non-superstars on: 1. points earned on the third problem, 2. the correctness of submissions, and 3. minutes spent working on submissions. The y-axes are scaled so that the vertical distance in correctness and minutes spent working roughly corresponds to the same vertical distance for points on the third problem. The igure suggests that the response to increased competition seen in points on the third problem is strongly related to the correctness of submissions. Whereas time spent working shows little response to increased competition across the ratings distribution, correctness of submissions closely tracks the changes in points. [Figure 8 about here.] Regression analysis using our primary ixed effects speci ication supports the patterns shown in Figure 8. As indicated earlier, we focus on the most affected groups, namely, coders in the PDZ and above the PDZ. Table 7 reports the estimated effect of increased competition on minutes spent working on all submissions for the third problem, and on minutes spent working on correct submissions only.17 For minutes spent working on all submissions, the only signi icant reaction comes from coders in the PDZ, who take longer to submit as the number of non-superstars increases. For minutes spent working on correct submissions, again only coders in the PDZ have a signi icant reaction, but at the 10 percent level of signi icance and to an increase in the number of superstars rather than non-superstars. Although these estimates suggest some reaction of those in the PDZ, they are inconsistent regarding the cause (superstars or non-superstars), and the reaction to superstars is not highly signi icant. [Table 7 about here.] In contrast to the somewhat inconclusive results for minutes spent working, we ind a highly signi icant increase in incorrect submissions for those in the PDZ in reaction to both superstars and non-superstars, as reported in Table 8.18 Submissions by this group are incorrect about 2 percent more often for each additional nonsuperstar and about 5 percent more often for each additional superstar. In addition, competitors above the PDZ, whose scores on the third problem show a positive response to additional superstars, are estimated to be incorrect about 1 percent less often for each additional superstar, although the effect is statistically insigni icant. A Wald test, however, con irms that the marginal effect for these competitors is signi icantly below that of competitors in the PDZ, who are most strongly affected. [Table 8 about here.] 7.1 Causes of Performance Deterioration These results suggest that the negative effects of increased competition are associated with a greater likelihood of making an error on the most dif icult problem, rather than with an increase in time to submission. This effect is concentrated among coders in the PDZ who are near but not at the top of the ability distribution. Potential causes of increased errors include deliberate (conscious) actions as well as unconscious reactions in the face of increased competition. Next we assess whether deliberate choices by competitors, such as increasing the riskiness of contest strategy or reducing effort, may explain the performance decrements that we observe. 7.2 Risk-taking During contests, competitors may take strategic risks. For example, consider a losing football team that runs a ``Hail Mary'' play in the last seconds of a game to try to eke out a victory. Although risky, if the play succeeds, the team wins the game. Similarly, in algorithm contests, consider the extreme case of a competitor who cares only about placing irst in the room. This is likely to cause the competitor to spend less time working on a problem and to 17 These analyses include only coders that opened the third problem. 18 In order to improve precision of the estimates for the control variables, all competitors are included in the analysis. Competitors who did not open the third problem are coded as having an incorrect submission. Only competitors who opened and submitted a solution to the third problem are used to estimate the interaction terms shown in Table 8, by interacting these terms with a dummy variable indicating whether there was a submission for the third problem. 16 submit a solution earlier than otherwise, which increases the provisional score (before correctness is evaluated). Although this strategy increases the risk that the submission is incorrect and receives zero points, it will increase the points awarded for the problem if it is correct--enabling the competitor to place more highly in the room. More generally, if competitors employ a riskier strategy of this sort, they will submit solutions more quickly with some positive probability. We look for evidence of such an approach in the amount of time spent working on submissions for the third problem. The average time spent working should fall if competitors shift to submitting fast but more error-prone solutions. The second column of Table 7 shows that the only signi icant effect on minutes spent working on submissions comes from coders in the PDZ in the face of an increased number of non-superstars. This effect is positive rather than negative, however, providing no evidence that competitors switch to riskier strategies in the face of increased competition. 7.3 Observed Effort Perhaps the simplest explanation for the decrease in performance that we observe is reduced effort by competitors. As the likely placement of a competitor falls due to increased competition, the expected bene it of exerting effort falls as well. A competitor then may ind it in his interest to reduce opportunity costs by reducing effort. For evidence of reduced effort, we examine three indicators that competitors stop work early, essentially dropping out of the contest. First, we examine whether competitors open the third problem. If a competitor fails to open the third problem, he may no longer be trying to earn points in the contest; he may also still be trying to solve one of the other problems. The former would indicate a ``drop out'' effect, while the latter would not. As shown in Table 7, we do not observe such an effect; an increase in the number of superstars or non-superstars does not signi icantly affect whether competitors open the third problem. The amount of time spent working on the third problem (for coders who have opened this problem) provides a second indicator of effort. Signi icantly less time spent on the third problem may indicate that a competitor has dropped out; it could also indicate greater risk taking, as noted earlier. However, as shown in the second and third columns of Table 7, time spent working on submissions either does not change signi icantly or increases as the number of superstars and non-superstars increases. Lastly, recall that each contest ends with a 15 minute challenge phase, in which coders can examine whether the submissions of other coders contain logical laws. If coders drop out of the contest, we would not expect them to participate in the challenge phase. Hence, the likelihood that a competitor issues a challenge should fall among the group most likely to drop out, namely, those who open the third problem but do not submit a solution. If coders in this group drop out, we would expect to see a negative effect of increased competition on the likelihood of issuing a challenge. As shown in Table 9, for coders in this group, the estimates show essentially no change in the likelihood of issuing a challenge in the face of a larger number of non-superstars. The same holds for coders in the PDZ in the face of a larger number of superstars.19 The only change in behavior comes from coders above the PDZ, who increase rather than decrease challenges when faced with additional superstars, consistent with earlier results showing that the inal points for these coders increase in the presence of a greater number of superstars. [Table 9 about here.] In summary, these results provide no evidence that coders reduce their effort as the number of non-superstars or superstars increases. Coders are no less likely to open the third problem, do not spend less time working on it, and are not less likely to issue a challenge in response to increased competition. 7.4 Cognitive Changes Although we ind no evidence that coders reduce observable effort in response to increased competition, points earned in a contest also depend on the correctness of submissions independent of effort. As noted earlier, errors 19 The interaction terms shown in Table 9 are also interacted with a dummy variable for having opened but not submitted a solution to the third problem. 17 in accepted submissions indicate logical laws in submitted code, which are likely to derive from cognitive factors. In order to investigate the possibility that cognitive factors explain the performance outcomes that we observe, we examine whether increased competition affects the likelihood of an incorrect submission for the third problem (conditional on submitting). We control for time spent working on submissions, because greater time spent working may reduce errors. As shown in Table 10, the results indicate that near-to-the-top competitors in the PDZ are signi icantly more likely to produce faulty code as the number of non-superstars and superstars increases.20 For the same amount of time spent working, these coders make about 1.5 percent more incorrect submissions for each additional nonsuperstar in the room. These coders also make about 4 percent more incorrect submissions for each additional superstar in the room. Hence, even when controlling for observable effort in terms of the amount of time that coders work on solutions, coders in the PDZ make more logical errors as competition increases.21 Here again, we ind evidence that coders react far more strongly to increased competition from superstars than from non-superstars. In addition, we ind no evidence that coders above the PDZ make more logical errors in the face of increased competition from non-superstars or superstars. [Table 10 about here.] Taken together, the evidence suggests that near-to-the-top competitors in the PDZ do not reduce their observable effort but make more logical mistakes. As noted earlier, a variety of cognitive factors could explain these results. For example, these mistakes could stem from a reduction in cognitive effort. That is, the brain may get tired even as coders continue to work on problems. In an experimental study, Bracha and Fershtman (2012) ind that some people may work harder but not smarter under tournament conditions. As an alternative or possibly complementary explanation, psychological pressure from increased competition may lead to choking that in turns leads to more mistakes. Choking occurs even when incentives for superior performance are high (Baumeister and Showers, 1986). For example, individuals have been found to choke when faced with high inancial incentives and competitive stakes (Ariely et al., 2009; Apesteguia and Palacios-Huerta, 2010). In a review of a large amount of evidence in psychology, DeCaro et al. (2011) note that two separate mechanisms cause choking. First, self-consciousness about performing correctly leads to increased attention on the precise steps in learning and executing skills. This in turn disrupts ``procedural'' skill execution that takes place without conscious awareness, commonly seen in sports activities like golf putting, hockey dribbling, and baseball batting. Secondly, distraction caused by undue focus on performing well reduces the amount of working memory available, as in mathematical puzzle-solving (Beilock and Carr, 2001, 2005). In the contests examined here, pressure to perform well could disrupt coders' routine (procedural) approaches to algorithmic problem solving, and a reduction in working memory could affect the ability to solve attention-demanding algorithmic problems. In contrast to near-to-the-top competitors in the PDZ, coders above the PDZ do not make more logical errors, and they exert greater effort by making more challenges when faced with increased numbers of superstars. These competitors, who have predicted ranks just below superstars, do not appear to suffer from cognitive deterioration in problem solving. As noted earlier, for these competitors, self-con idence stemming from high ability may counterbalance any negative impact of pressure to perform well. In addition, the increase in observable effort in the face of additional superstars is consistent with the argument that those on the edge of winning positions have an incentive to exert maximum effort (Casas-Arce and Martı́nez-Jerez, 2009). 8 Summary and Conclusions This study began with the observation that prior research has found that individuals in tournament-style contests perform less well in the face of increased competition, but that studies often lack evidence about the mechanisms that underlie this result, especially in ield settings. We provide evidence regarding three mechanisms that may 20 The interaction terms shown in Table 10 are also interacted with a dummy variable for having submitted a solution to the third problem. 21 Note that the reported coef icients on the time spent working in Table 10 may be biased. It is possible that time spent working and incorrect submissions are both in luenced by an unobserved cognitive variable. However, the coef icients on the competition measures will still be estimated consistently. 18 account for such a performance decline: reduction in effort, increased risk taking, and deterioration in cognitive processing. In the algorithmic programming contests studied here, we ind that the largest negative reactions to increased competition come from a group of competitors who are near-to-the-top in terms of ability. In contrast to the predictions of tournament theory, we ind no evidence that competitors in this group reduce their effort in reaction to increased competition. For example, time spent working on problems, a particularly relevant measure of effort in these contests, does not decrease as the number of competitors increases. We also ind no evidence of increased risk taking. Instead, the evidence shows that competitors in this group make more logical errors when faced with increased competition, especially from superstars, suggesting that cognitive factors at least partly account for the decline in performance. We also ind that a small group of very high ability competitors (excluding superstars) reacts positively to increased competition from superstars. These competitors exert somewhat greater effort during contests, consistent with economic logic that competitors on the edge of winning may exert maximum effort, and cognitive errors do not increase, consistent with psychological research suggesting that very high ability competitors may not succumb to performance pressure. In addition to providing evidence on the mechanisms that underpin changes in performance in reaction to increased competition, this study extends prior research in a number of ways. First, the structure of TopCoder contests enables us to go beyond prior empirical research on the N-effect, by distinguishing between increased competition from superstars and non-superstars. We ind that an additional superstar has a much more negative effect on the performance of the average competitor than an additional non-superstar. This inding also contributes to empirical research on the superstar effect, which has focused heavily on sporting events, by providing evidence from non-sports contests that increased competition from superstars negatively affects the average performance of non-superstars. Our study also shows that the ability of competitors affects their reactions to increased competition. Prior research in both economics and psychology suggests that more highly skilled competitors may have the strongest reactions. We ind that this holds in online algorithm contests. Although lower ability competitors have a significant (and negative) reaction to increased competition, most of the reaction comes from high ability competitors (excluding superstars). Our results show that these high ability competitors react negatively to an increase in non-superstars. This is not surprising, given that non-superstars include some competitors of high ability, who therefore may cause the rank of a high ability competitor to decline. In addition, like lower ability competitors, these high ability coders react more strongly to increased competition from superstars than from non-superstars. However, as noted above, the high ability coders differ in their reactions to superstars. Most react negatively, but a small group of competitors, who are just below superstars in their abilities, have a positive reaction. Although Brown (2011) does not ind a positive reaction of highly skilled players to the superstar Tiger Woods in PGA tournaments, Connolly and Rendleman Jr. (2009) ind that high ability players paired with Woods perform better when both are in contention to win a tournament than when they are not. These mixed indings regarding such next-to-the-top competitors suggest that future research is warranted. Although very high ability coders react positively to increased competition from superstars in the algorithm contests, most competitors react negatively to an increase in both superstars and non-superstars. Strikingly, coders who account for the largest portion of this negative reaction do not reduce observable ``labor'' effort. Instead, we ind that these coders make more logical errors when faced with increased competition. Bracha and Fershtman (2012) provide experimental evidence in tournaments that suggests that reduced cognitive effort may play a role. In addition, although psychology research has not examined choking in tournaments, new experimental evidence suggests that choking may be especially relevant in this setting. This evidence comes from DeCaro et al. (2011), who show that the makeup of the pressure situation affects which of the choking mechanisms (disruption of procedural skill execution or a reduction in working memory caused by distraction) comes into play. In particular, pressure from being watched, termed ``monitoring pressure'', leads to disruption of procedural learning and skill execution.22 In contrast, pressure to earn a reward if a certain outcome is achieved, termed ``outcome pressure'', leads to distraction from the task at hand. Many tournaments contain both types of pressure. Partici22 DeCaro et al. (2011) mention the presence of a mirror or a video camera as creating monitoring pressure from being watched, in addition to watching by other individuals. Thus, being watched in any way results in monitoring pressure. 19 pants watch and are watched by other participants, which could disrupt participants' procedural skill execution. In addition, almost by de inition, tournaments contain rewards for performing well (even if simply ranking highly), which could distract participants from devoting full attention to the task at hand. More generally, a better understanding of behavioral responses in contests can aid both public policy and contest designers. The use of contests to elicit creative effort and technological innovation has gained renewed interest in both the public and private sectors (Tapscott and Williams, 2006; National Research Council, 2007; McKinsey & Company, 2009; Zients, 2010). With further work to understand what triggers performance losses, contest designers may be able to avoid generating nuisance effects in the contest environment such as choking. While developing a resilience to choking may be bene icial for athletes whose jobs entail participating in competitive sporting events, such resilience is unlikely to be critical for software development, scienti ic research, and creative skills now sought in online contests. This suggests that sponsors of such contests face the challenge of inding ways to reduce the negative effects of cognitive factors so that contests can better measure ability and provide incentives for performance. 9 References References Apesteguia, J., Palacios-Huerta, I., 2010. Psychological pressure in competitive environments: Evidence from a randomized natural experiment. American Economic Review 100, 2548--2564. Ariely, D., Gneezy, U., Loewenstein, G., Mazar, N., 2009. Large stakes and big mistakes. Review of Economic Studies 76, 451--469. Arnsten, A. F. T., 2012. Stress signalling pathways that impair prefrontal cortex structure and function. Nature Reviews Neuroscience 13, 410--422. Autor, D. H., 2001. Wiring the labor market. The Journal of Economic Perspectives 15, 25--40. Baumeister, R. F., Showers, C. J., 1986. A review of paradoxical performance effects:choking under pressure in sports and mental tests. European Journal of Social Psychology 16, 361--383. Beilock, S. L., Carr, T. H., 2001. On the fragility of skilled performance: What governs choking under pressure? Journal of Experimental Psychology: General 130, 701--725. Beilock, S. L., Carr, T. H., 2005. When high-powered people fail : Working memory and ''choking under pressure'' in math. Psychological Science 16, 101--105. Boudreau, K. J., Lacetera, N., Lakhani, K. R., 2011. Incentives and problem uncertainty in innovation contests: An empirical analysis. Management Science 57 (5), 843--863. Bracha, A., Fershtman, C., 2012. Competitive incentives:working harder or working smarter? Management Science Forthcoming (5-12), --. Brown, J., 2011. Quitters never win: The (adverse) incentive effect of competing with superstars. Journal of Political Economy 119 (5), 982--1013. Brunt, L., Lerner, J., Nicholas, T., 2011. Inducement prizes and innovation. Discussion paper SAM 25 2011, Norwegian School of Economics. Casas-Arce, P., Martı́nez-Jerez, F. A., 2009. Relative performance compensation, contests, and dynamic incentives. Management Science 55, 1306--1320. Che, Y.-K., Gale, I., 1983. Optimal design of research contests. American Economic Review 93, 646--671. 20 Connolly, R. A., Rendleman Jr., R. J., February 2009. Dominance, intimidation and `choking' on the pga tour. Journal of Quantitative Analysis in Sports 5, 1--32. DeCaro, M. S., Thomas, R. D., Albert, N. B., Beilock, S. L., 2011. Choking under pressure: Multiple routes to skill failure. Journal of Experimental Psychology: General 140, 390–406. Fullerton, R. L., Linster, B. G., McKee, M., Slate, S., 2002. Using auctions to reward tournament winners: Theory and experimental investigations. The RAND Journal of Economics 33, 62--84. Fullerton, R. L., McAfee, R. P., June 1999. Auctioning entry into tournaments. Journal of Political Economy 107 (3), 573--605. Garcia, S. M., Tor, A., 2009. The n-effect more competitors, less competition. Psychological Science 20 (7), 871--877. Greene, W. H., 2003. Econometric Analysis. Prentice-Hall, Upper Saddle River, NJ. Holmstrom, B., Milgrom, P., 1991. Multitask principal-agent analyses: Incentive contracts, asset ownership, and job design. Journal of Law, Economics, & Organization 7, 24--52. Kahneman, D., 1973. Attention and effort. Prentice-Hall, Englewood Cliffs, NJ. Knoeber, C. R., Thurman, W. N., 1994. Testing the theory of tournaments: An empirical analysis of broiler production. Journal of Labor Economics 12, 155--179. Konrad, K. A., 2009. Strategy and Dynamics in Contests. Oxford University Press. Kremer, M., 1998. Patent buy-outs: A mechanism for encouraging innovation. Quarterly Journal of Economics 4 (6304), 1137--1167. Kremer, M., Williams, H., 2010. Incentivizing innovation: Adding to the tool kit. In: Innovation Policy and the Economy. University of Chicago Press, pp. 1--17. Lakhani, K. R., Garvin, D. A., Lonstein, E., 2010. Topcoder (a): Developing software through crowdsourcing. Case Study 9-610-032, Harvard Business School. Lallemand, T., Plasman, R., Rycx, F., 2008. Women and competition in elimination tournaments evidence from professional tennis data. Journal Of Sports Economics 9, 3--19. Lazear, E. P., Rosen, S., 1981. Rank-order tournaments as optimum labor contracts. Journal of Political Economy 89, 841--864. McKinsey & Company, October 2009. And the winner is: Capturing the power of philanthropic prizes. Online. URL http://www.mckinsey.com/clientservice/Social_Sector/our_practices/Philanthropy/ Knowledge_highlights/And_the_winner_is.aspx. Moldovanu, B., Sela, A., 2001. The optimal allocation of prizes in contests. American Economic Review 91, 542-558. National Research Council, 2007. Innovation Inducement Prizes at the National Science Foundation. The National Academies Press, Washington, DC. Otten, M., 2009. Choking vs. clutch performance: A study of sport performance under pressure. Journal of Sport and Exercise Psychology 31, 583--601. Prendergast, C., 1999. The provision of incentives in irms. Journal of Economic Literature 37, 7--63. Riley, D., 2012. New tiger, old stripes. Gentlemen's Quarterly. Scotchmer, S., 2004. Innovation and incentives. MIT Press. 21 Sunde, U., December 2003. Potential, prizes and performance: Testing tournament theory with professional tennis data. Discussion Paper 947, Institute for the Study of Labor, IZA P.O. Box 7240 D-53072 Bonn Germany. Szymanski, S., Valletti, T. M., 2005. Incentive effects of second prizes. European Journal of Political Economy 21, 467--481. Tanaka, R., Ishino, K., 2012. Testing the incentive effects in tournaments with a superstar. Journal of The Japanese and International Economies in press., xxx--xxx. Tapscott, D., Williams, A. D., 2006. Wikinomics: How Mass Collaboration Changes Everything. Penguin, New York. Taylor, C. R., 1995. Digging for golden carrots: An analysis of research tournaments. American Economic Review 85, 872--890. Terwiesch, C., Xu, Y., 2008. Innovation contests, open innovation, and multiagent problem solving. Management Science 54, 1529--1543. Wright, B. D., 1983. The economics of invention incentives: Patents, prizes, and research contracts. The American Economic Review 73, 691--707. Yatchew, A., 1998. Nonparametric regression techniques in economics. Journal of Economic Literature 36, 669-721. Zients, J. D., March 2010. Guidance on the use of challenges and prizes to promote open government. Memorandum for the Heads of Executive Departments and Agencies. URL http://www.whitehouse.gov/omb/assets/memoranda_2010/m10-11.pdf Appendices A The TopCoder Rating System Here we describe the main elements of the TopCoder rating system. The rating system is described in greater detail at: HTTP://WWW.TOPCODER.COM/WIKI/DISPLAY/TC/ALGORITHM+COMPETITION+RATING+SYSTEM. The formula for a coder's rating is: NewRating = OldRating + Weight ⋅ PerfAs 1 + Weight A coder's rating is updated at the end of each contest to produce NewRating.23 OldRating is the coder's pre-contest rating. If a coder has never competed in a TopCoder algorithm contest, TopCoder assigns a value of 1200 to OldRating. Rearranging terms, based on the formula for PerfAs below, yields: NewRating = OldRating + Weight ⋅ CF ⋅ (APerf − EPerf) 1 + Weight PerfAs is the provisional rating assigned to each coder at the end of a contest. PerfAs = OldRating + CF ⋅ (APerf − EPerf) 23 On the TopCoder website, in the explanation of the rating system, the variable Rating is sometimes used in place of what we term OldRating. We use NewRating and OldRating for clarity. 22 APerf is the coder's rank order performance in the contest, calculated as a value in an inverse standard normal distribution that adjusts for the number of coders per contest: APerf = −Φ ARank − 0.5 NumCoders where ARank is the coder's rank in a contest, based on total points per coder and NumCoders is the number of coders in the contest. EPerf is the predicted value of APerf, based on the coder's pre-contest rating relative to the pre-contest ratings of other contestants: EPerf = −Φ ERank − 0.5 NumCoders where ERank = 0.5 + ∑ . , or Win Probability, is the probability that the coder will have a higher score than another coder in the contest. Each Win Probability is calculated based on the pre-contest ratings of coders that entered the contest, adjusted for a measure of the spread of each coder's prior contest ratings, termed Volatility. Coders that have never competed before receive an initial value of 300 for Volatility. In the formula for PerfAs, CF denotes a ``Competition Factor'' for each contest. CF captures the spread of the pre-contest ratings of coders in the contest, based on both pre-contest Volatilities of the contestants and a measure of the difference between the average pre-contest rating of contestants and individual coder pre-contest ratings. A greater spread of pre-contest ratings results in a higher competition factor, leading to a higher weight on the difference between a coder's actual and anticipated performance. Intuitively, changes in rank order performance in a contest where coders have similar abilities, as measured by pre-contest ratings, are more likely to re lect random factors rather than skill, and therefore receive lower weight in calculating the new rating. Finally, in the formula for NewRating, Weight for each coder is an inverse function of the number of times that the coder has been rated previously. More experienced coders have less weight attached to the difference between their current rank order performance, APerf, and their predicted rank order performance as re lected in EPerf. In addition, a coder's NewRating cannot exceed his or her OldRating by more than a set value termed Cap, which is an inverse function of the number of times that a coder has been rated. The values of Weight and Cap insure that the ratings of more experienced coders change less over time than do the ratings of less experienced coders. B Large Tables [Table 11 about here.] [Table 12 about here.] 23 Figures Figure 1: Number of Competitors in Room. 40 30 Percent . 20 10 0. 15 16 17 18 19 Number in Room 20 21 24 Figure 2: Illustration of how competitive proximity is de ined. (a) Competitors are ordered by predicted score. . A B Points (b) Two competitors with overlapping prediction intervals are considered close. . A B Points (c) Two competitors whose prediction intervals overlap with a third are considered close. . A C B Points (d) Two competitors whose prediction intervals are connected by any number of intervening intervals are considered close. . A C D E B Points Figure 3: Illustration of the identi ication of superstar competitors. (a) Scores predicted from previous years' performance. . 1 2 3 5 6 78 9 10 11 12 13 14 16 15 18 19 1720 Points (b) ± std. dev. prediction intervals from previous years' performance. . 1 2 3 5 6 78 9 10 11 12 13 14 16 15 18 19 1720 Points (c) Union of overlapping intervals. Group 1 . 1 2 3 Group 2 5 6 78 9 Group 3 10 11 12 Group 4 13 14 Group 5 16 15 18 19 1720 Points (d) Highest group labelled superstars. Superstars . 1 2 3 5 6 78 9 10 11 12 13 14 16 15 18 19 1720 Points 25 Figure 4: Number of Superstars in competition rooms. 70 60 50 Percent 40 30 20 10 0. 0 4 1 2 3 Number of Superstars 5 Figure 5: Number of groups in competition rooms. 30 25 20 Percent 15 10 5 0. 1 2 3 4 5 6 7 Number of Groups 8 9 26 Figure 6: Response to competition across TopCoder ratings using a locally-weighted, kernel approach. 40 Performance Deterioration Zone (PDZ) 20 Points / SS (NSS) 0 −20 −40 . 1,400 . . 1,600 1,800 2,000 2,200 2,400 TopCoder Rating . Number of Non-Superstar Competitors . . ± 1 .s.d. . Number of Superstar Competitors ± 1 .s.d. 27 Figure 7: Response to competition for the three contest problems and challenge phase using kernel techniques. Total Points 25 0 −25 . Challenge Points 25 0 −25 . Points / SS (NSS) Problem 1 25 0 −25 . Problem 2 25 0 −25 . Problem 3 25 0 −25 . 1,400 1,600 1,800 2,000 2,200 2,400 TopCoder Rating . . . Number of Non-Superstar Competitors . . ± 1 .s.d. . Number of Superstar Competitors ± 1 .s.d. 28 Figure 8: Response to competition on third problem: points, correctness, and time worked, using kernel techniques. Points / SS (NSS) Problem 3 Score 25 0 −25 . Correctness of Submission 0.03 0 Correctness / SS (NSS) −0.03 . Minutes Spent Working 2.3 0 −2.3 . 1,400 1,600 1,800 2,000 2,200 2,400 TopCoder Rating . . . Number of Non-Superstar Competitors . . ± 1 .s.d. . Number of Superstar Competitors ± 1 .s.d. Mins. / SS (NSS) 29 Tables Table 1: The response of inal points to number of competitors in the room. Covariates Number in Room TopCoder Rating 1 Problem Point Value 2 3 Problem Point Value Problem Point Value Final Points −2.593⋆⋆⋆ (0.975) 0.0731⋆⋆⋆ (0.00675) −0.724 (0.0525) ⋆⋆⋆ Table 2: The response of inal points to the number of superstars and number of competitors in the room. Covariates Number in Room Number of Superstars TC Rating 1 Problem Point Value 2 3 Problem Point Value Problem Point Value Final Points −2.606⋆⋆⋆ (0.974) −2.670⋆⋆⋆ (0.845) 0.0727⋆⋆⋆ (0.00675) −0.724⋆⋆⋆ (0.0524) −0.497⋆⋆⋆ (0.0284) −0.147⋆⋆⋆ (0.0215) 0.022 (0.0236) 10.76⋆⋆⋆ (2.698) −5.641⋆ (3.23) 208.0⋆⋆⋆ (11.16) Yes Yes 50,130 4,432 −0.495⋆⋆⋆ (0.0284) −0.146⋆⋆⋆ (0.0215) 0.0245 (0.0236) 10.54 (2.698) −5.843 (3.229) ⋆⋆⋆ Number in Contest Was Money Paid Contest Year Constant Month Dummies Day of Week Dummies Observations Number of Competitors ⋆ 208.2 (11.16) Yes Yes 50,130 4,432 ⋆⋆⋆ Number in Contest Was Money Paid Contest Year Constant Month Dummies Day of Week Dummies Observations Number of Competitors Competitor Fixed-effects, Standard errors in parentheses, ⋆⋆⋆ . , ⋆⋆ . ,⋆ . Competitor Fixed-effects, Standard errors in parentheses, ⋆⋆⋆ . , ⋆⋆ . ,⋆ . 30 Table 3: The response of inal points to the number of superstars and non-superstars in the room. Covariates Number of Non-Superstars Number of Superstars TC Rating 1st Problem Point Value 2nd Problem Point Value 3rd Problem Point Value Number in Contest Was Money Paid Contest Year Constant Month Dummies Day of Week Dummies Observations Number of Competitors Final Points −2.606⋆⋆⋆ (0.974) −5.277⋆⋆⋆ (1.293) 0.0727⋆⋆⋆ (0.00675) −0.724⋆⋆⋆ (0.0524) −0.497⋆⋆⋆ (0.0284) −0.147⋆⋆⋆ (0.0215) 0.0220 (0.0236) 10.76⋆⋆⋆ (2.698) −5.641⋆ (3.230) 207.6⋆⋆⋆ (11.14) Yes Yes 50,139 4,432 Competitor Fixed-effects, Standard errors in parentheses, ⋆⋆⋆ . , ⋆⋆ . ,⋆ . 31 Table 4: Response to competition for the three contest problems and challenge phase. Covariates Below PDZ × Number of Non-Superstars PDZ × Number of Non-Superstars Above PDZ × Number of Non-Superstars Below PDZ × Number of Superstars PDZ × Number of Superstars Above PDZ × Number of Superstars Constant Final Points −1.930⋆ (1.031) −7.608⋆⋆ (3.144) -9.775 (6.447) −4.388⋆⋆⋆ (1.347) −19.83⋆⋆⋆ (5.615) 31.00⋆ (16.57) 203.8⋆⋆⋆ (11.17) 1 Problem Points −0.760⋆ (0.427) -1.194 (1.302) -2.131 (2.669) −1.442⋆⋆⋆ (0.558) -0.273 (2.324) -3.274 (6.86) 168.6⋆⋆⋆ (4.623) 2 Problem Points −2.084⋆⋆⋆ (0.629) -1.296 (1.918) -2.743 (3.932) −2.764⋆⋆⋆ (0.822) -3.376 (3.425) 3.73 (10.11) 52.86⋆⋆⋆ (6.812) 3 Problem Points 0.473 (0.603) Challenge Points 0.376⋆⋆ (0.177) 0.111 (0.54) 2.822⋆⋆ (1.107) -0.0866 (0.231) 0.0542 (0.964) -1.385 (2.845) 0.9 (1.917) −5.141⋆⋆⋆ (1.84) −7.713⋆⋆ (3.772) -0.127 (0.788) −15.97⋆⋆⋆ (3.285) 31.98⋆⋆⋆ (9.695) −18.75⋆⋆⋆ (6.533) Includes: PDZ Indicators (Below Omitted), Skill Control, and Contest Controls; Observations: 50,139; Number of competitors: 4,432; Competitor Fixed-effects, Standard errors in parentheses, ⋆⋆⋆ . , ⋆⋆ . ,⋆ . Table 5: Response to competition for points on the third problem, with varying controls and speci ications. Covariates Below PDZ × Number of Non-Superstars PDZ × Number of Non-Superstars Above PDZ × Number of Non-Superstars Below PDZ × Number of Superstars PDZ × Number of Superstars Above PDZ × Number of Superstars Constant Skill Control Contest Controls Competitor Fixed-Effects Experience Control Contest Fixed-Effects OLS No Controls -0.198 (0.568) −5.843⋆⋆⋆ (1.825) −7.157⋆ (3.719) -1.153 (0.746) −18.94⋆⋆⋆ (3.257) 24.64⋆⋆⋆ (9.544) 18.15⋆⋆⋆ (0.678) No No No No No . OLS 0.767 (0.568) −4.870⋆⋆⋆ (1.795) −6.671⋆ (3.649) 0.137 (0.739) −16.40⋆⋆⋆ (3.196) 28.66⋆⋆⋆ (9.357) −11.92⋆ (6.176) No No Yes No No ,⋆ . Primary Speci ication 0.473 (0.603) −5.141⋆⋆⋆ (1.84) −7.713⋆⋆ (3.772) -0.127 (0.788) −15.97⋆⋆⋆ (3.285) 31.98⋆⋆⋆ (9.695) −18.75⋆⋆⋆ (6.533) Yes Yes Yes No No Contest FE 1.323⋆⋆ (0.585) −4.281⋆⋆ (1.74) −8.324⋆⋆ (3.514) 1.122 (0.745) −15.50⋆⋆⋆ (3.085) 18.32⋆⋆ (9.011) 28.60⋆⋆⋆ (0.736) Yes No No No Yes Competitor FE Experience 0.497 (0.603) −5.153⋆⋆⋆ (1.839) −7.654⋆⋆ (3.771) -0.0935 (0.788) −16.06⋆⋆⋆ (3.285) 32.08⋆⋆⋆ (9.694) −23.80⋆⋆⋆ (6.675) Yes Yes Yes Yes No Includes: PDZ Indicators (Below Omitted), Skill Control, and Contest Controls; Observations: 50,139; Number of competitors: 4,432; Competitor Fixed-effects, Standard errors in parentheses, ⋆⋆⋆ . , ⋆⋆ 32 Table 6: Response to competition with restricted numbers of superstars. Covariates Below PDZ × Number of Non-Superstars PDZ × Number of Non-Superstars Above PDZ × Number of Non-Superstars Below PDZ × Number of Superstars PDZ × Number of Superstars Above PDZ × Number of Superstars Constant All Rooms 0.473 (0.433) −5.141⋆⋆⋆ (0.0052) −7.713⋆⋆ (0.0409) -0.127 (0.872) −15.97⋆⋆⋆ (1.16E-06) 31.98⋆⋆⋆ (0.000974) −26.75⋆⋆ (0.0416) Rooms with < 3 Superstars 0.0743 (0.514) −3.956⋆ (0.0709) 11.59⋆⋆ (0.0339) -0.0348 (0.975) −28.98⋆⋆⋆ (2.14E-08) 73.71⋆⋆⋆ (-4.17E-08) −18.46⋆⋆⋆ (0.00558) . ,⋆ . Rooms with < 4 Superstars 0.1 (0.398) 0.631 (0.811) 11.14 (0.22) -0.135 (0.876) −22.31⋆⋆⋆ (3.65E-06) 42.75⋆⋆ (0.0116) −18.18⋆⋆⋆ (0.00656) Includes: PDZ Indicators (Below Omitted), Skill Control, and Contest Controls; Observations: 50,139; Number of competitors: 4,432; Competitor Fixed-effects, Standard errors in parentheses, ⋆⋆⋆ . , ⋆⋆ Table 7: Response to competition on third problem: likelihood of opening the problem and time spent working. Min. Worked on 3 Submissions 0.643⋆⋆ (0.322) 0.153 (0.519) 0.969 (0.598) -0.0948 (1.247) 33.37⋆⋆⋆ (2.103) Min. Worked on Correct 3 Submissions 0.385 (0.435) -0.0693 (0.658) 1.344⋆ (0.809) -0.00679 (1.572) 33.34⋆⋆⋆ (2.088) Covariates PDZ × Number of Non-Superstars Above PDZ × Number of Non-Superstars PDZ × Number of Superstars Above PDZ × Number of Superstars Constant 3 Not Opened -0.00108 (0.00554) 0.00391 (0.0114) -0.0105 (0.00989) 0.00357 (0.0292) 0.297⋆⋆⋆ (0.0196) Includes: PDZ Indicators (Below Omitted), Skill Control, and Contest Controls; Observations: 50,139 (column 1), 6,174 (columns 2 & 3); Number of competitors: 4,432 (column 1), 1,461 (columns 2 & 3); Competitor Fixed-effects, Standard errors in parentheses, ⋆⋆⋆ . , ⋆⋆ . ,⋆ . 33 Table 8: Response to competition on third problem: likelihood of errors in submissions. 3rd Incorrect cond. on submitting Covariates PDZ × Number of Non-Superstars Above PDZ × Number of Non-Superstars PDZ × Number of Superstars Above PDZ × Number of Superstars Constant Includes: PDZ Indicators (Below Omitted), Skill Control, and Contest Controls; Observations: 50,139; Number of competitors: 4,432; Competitor Fixed-effects, Standard errors in parentheses, ⋆⋆⋆ . , ⋆⋆ . ,⋆ . 0.0209⋆⋆⋆ (0.00487) 0.0116 (0.00792) 0.0495⋆⋆⋆ (0.00903) Table 10: Likelihood of incorrect submissions for the third problem controlling for minutes spent working. -0.0118 (0.0192) 1.039⋆⋆⋆ (0.00995) Covariates PDZ × Number of Non-Superstars Above PDZ × Number of Non-Superstars PDZ × Number of Superstars Above PDZ × Number of Superstars PDZ × Min. Spent Working 3rd Incorrect cond. on submitting 0.0149⋆⋆⋆ (0.00486) 0.00930 (0.00788) 0.0408⋆⋆⋆ (0.00900) -0.0162 (0.0191) 0.00709⋆⋆⋆ (0.000399) 0.00662 (0.000593) 1.037⋆⋆⋆ (0.00990) Table 9: Likelihood of competitors, who opened but did not submit the third problem issuing a challenge. Issued Challenge -0.0116 (0.0255) -0.0523 (0.0962) 0.0627 (0.0545) 0.542⋆ (0.323) 0.279⋆⋆⋆ (0.0217) Above PDZ × Min. Spent Working Constant Covariates PDZ × Number of Non-Superstars Above PDZ × Number of Non-Superstars PDZ × Number of Superstars Above PDZ × Number of Superstars Constant Includes: PDZ Indicators (Below Omitted), Skill Control, and Contest Controls; Observations: 50,139; Number of competitors: 4,432; Competitor Fixed-effects, Standard errors in parentheses, ⋆⋆⋆ . , ⋆⋆ . ,⋆ . Includes: PDZ Indicators (Below Omitted), Skill Control, and Contest Controls; Observations: 50,139; Number of competitors: 4,432; Competitor Fixed-effects, Standard errors in parentheses, ⋆⋆⋆ . , ⋆⋆ . ,⋆ . 34 Table 11: Descriptive statistics Variable TopCoder Rating Number of Contests (prior experience) Final Points in Contest Final Points on Final Points on Final Points on Final Points on Final Points on Final Points on Challenge Points Minutes Worked on Problem Incorrect Minutes Worked on ⋆ ⋆ ⋆ Mean 1603.82 35.63 246.81 129.28 84.44 29.99 142.42 162.76 243.58 2.86 Problem 3.92 0.95 Problem (cond. on submitting) 31.83 0.58 352.1 18.61 1753.89 256.63 511.61 985.65 1.31 17.3 2.93 1323.4 16.2 8.17 8.34 2.1 15.46 1.99 1.15 Std. Dev. 329.24 35.91 256.67 99.1 145.84 132.93 94.6 168.08 302.48 37.56 11.46 0.22 13.31 0.49 127.82 1.06 53.98 20.98 41.39 51.67 1.21 1.61 1.36 2205.9 6.65 3.23 2.52 1.8 2.53 1.57 1.28 Min 1200 1 -400 0 0 0 0 0 0 -500 0 0 0.15 0 113 13 1550 200 400 750 0 2 1 0 7 0 0 0 7 0 0 Max 3375 273 1972.53 308 643 990 308 643 990 700 74.88 1 74.88 1 619 20 1950 375 675 1200 16 20 9 5032 34 20 16 9 20 8 8 Problem Problem Problem ⋆ Problem (cond. on submitting) Problem (cond. on submitting) Problem (cond. on submitting) Problem Incorrect (cond. on submitting) Number of Competitors in Contest Number of Competitors in Room Total Points Available in Contest Point Value of First Problem Point Value of Second Problem Point Value of Third Problem Number of Superstars per room Number of Non-superstars per room Number of Groups in room Dollars Paid Out in Prizes Number of Rooms per Contest Number of Blue Coders per Room Number of Yellow Coders per Room Number of Red Coders per Room Number of Below PDZ Coders per Room Number of PDZ Coders per Room Number of Above PDZ Coders per Room ⋆ ⋆⋆ Includes coders who did not make submissions and therefore earned 0 points ⋆⋆ Calculated as the sum of the point values for the three problems, exclusive of challenge points 35 Table 12: Response to competition using dummy variables to indicate number of superstars. Covariates Below PDZ × Number of Non-Superstars PDZ × Number of Non-Superstars Above PDZ × Number of Non-Superstars Below PDZ × 1 Superstar Below PDZ × 2 Superstars Below PDZ × 3 Superstars Below PDZ × 4 Superstars Below PDZ × 5+ Superstars PDZ × 1 Superstar PDZ × 2 Superstars PDZ × 3 Superstars PDZ × 4 Superstars PDZ × 5+ Superstars Above PDZ × 1 Superstar Above PDZ × 2 Superstars Above PDZ × 3 Superstars Above PDZ × 4 Superstars Above PDZ × 5+ Superstars Constant Includes: PDZ Indicators (Below Omitted), Skill Control, and Contest Controls; Observations: 50,139; Number of competitors: 4,432; Competitor Fixed-effects, Standard errors in parentheses, ⋆⋆⋆ . , ⋆⋆ . ,⋆ . Final Points -1.46 (0.981) −7.521⋆⋆ (3.14) -9.351 (6.444) −7.420⋆⋆ (3.344) −8.237⋆ (4.471) -9.513 (6.22) −16.40⋆ (8.421) −30.30⋆⋆⋆ (9.43) -1.599 (12.73) −54.62⋆⋆⋆ (15.74) −61.91⋆⋆⋆ (23.43) 26.22 (38.08) -38.97 (55.47) 15.9 (45.02) 71.68 (49.4) -3.218 (73.09) No Obs. No Obs. 232.7⋆⋆⋆ (11.7) 36