Beyond the Wall: Mapping Twitter in China

In this paper, we map and analyze the structure and content found on Twitter centered around users in mainland China. This study offers a rare look at the activity of Chinese Internet users on a platform that is largely unregulated by the state and only reachable through the use of tools that circumvent state-mandated Internet filters. For Internet users that reside in mainland China, Twitter offers access to news from around the world and a wealth of ideas and perspectives that might otherwise be unavailable there, as well as a platform for building online communities that is not under direct control of the government. This study of Chinese Twitter — to our knowledge the first such study — offers a unique window into the online activities and global connections of Chinese Internet users who actively circumvent content restrictions. Based on a mixed-methods approach, combining social network analysis and a qualitative review of the content and activity of Chinese Twitter, we are able to map and provide detailed accounts of the topically based clusters that form among these networks. We identify 36 clusters that focus primarily on three areas: politics, technology, and entertainment. From one perspective, the discourse in the politically engaged portions of Chinese Twitter suggests that Twitter serves an alternative public sphere. The political group is formed of journalists, lawyers, human rights activists, and scholars, who are free to discuss topics typically not permitted in China, such as the Tiananmen Square protests, Tibetan and Uyghur issues, political scandals, and pollution. Yet China’s Internet repression is clearly succeeding. Chinese Twitter falls well short of supporting a broadly accessible networked public sphere. The proportion of the Chinese populace with direct access to the debates, communities, and shared resources on Twitter is relatively small, and the avenues by which such discourse might find its way into mainstream political discussion are severely constrained. The firewall between Twitter and the much larger social media platforms in China remains a formidable barrier.

Social media monitoring and filtering was initially applied to blogging platforms by requiring the platforms themselves to police content. 2 This system has been expanded to the full range of current social media platforms visited by a large and growing number of users. Social media companies combine automated review mechanisms and human review to examine the many millions of posts every day. By some estimates, Chinese social media companies employ tens of thousands of people to monitor and selectively block social media content. 3

JUMPING THE WALL
There are many options for Internet users who wish to get around government filters, an activity colloquially known as "jumping the wall" in China. These include simple proxies, which are easily added to blocking lists when discovered by the government, and more sophisticated tools designed specifically to avoid detection and resist blocking. Virtual private networks (VPNs), which are a standard tool for businesses to ensure the security of online transactions, are also commonly used to get around Internet filters. The Chinese government typically blocks a broad range of proxies and circumvention tools and engages in a technological battle with circumvention tool developers. The government has never succeeded in blocking all circumvention tools, and doing so would be harmful to online commerce, a situation the Open Internet Tool Project (OpenITP) dubbed "collateral freedom." 4 The end result is that Internet users who are intent on circumventing Internet filters are able to do so. However, jumping the wall entails investing time to identify and install tools that work and requires a level of technological sophistication. Moreover, using circumvention tools often slows down connectivity speeds. Circumventing Internet controls also implies a willingness to defy the government standards for acceptable speech and to take on any perceived risks associated with using circumvention tools.
The implicit tradeoffs made by government regulators between information control and economic growth shift over time and changes in the regulatory climate and technological controls often mean that certain circumvention tools and strategies no longer work. For example, after the OpenITP report, the Google App Engine, crucial infrastructure for e-commerce, was blocked in 2014, and circumvention tools that operated through this platform stopped working. In 2015, various VPN providers reported being blocked as well.
It is unclear how many people in China have used or regularly use circumvention tools. A Berkman Center study estimated users to be less than 5% of the online population. 5 There is a debate over the potential market for such tools and whether low adoption rates are a function of low underlying demand for these tools or of the inconvenience, security, and usability issues with the tools. Of particular relevance to this report, a recent survey of circumvention tool users in China found that using blocked social media sites, including Twitter, was among the most popular reasons for using these tools. 6 In addition to the technological proficiency required to acquire and use circumvention tools, users must also be willing to take on the risks of defying government content restrictions. For those that Mapping Twitter in China INTERNET MONITOR are able and willing to overcome these obstacles, we expect the users inside the Great Firewall (GFW) would find it more difficult to access Twitter, use it less often, and form fewer connections. A As we describe later, the data support these expectation. Users outside the GWF tend to have more followers and to follow more users on Twitter.
By contrast, in 2013, Sina Weibo had 556 million registered users, 50 million of whom were active on a daily basis. Other Chinese social media have also attracted a great number of users, including WeChat, QZone, and Renren. The popularity of Chinese social media suggests strong network effects: Chinese Internet users will place higher value on a platform if more of their family and friends are using this service or if the service provides access to a broader range of conversations. Based on network effects alone, Sina Weibo has a much larger pull than Twitter. Moreover, to use Twitter, Chinese users not only have to overcome the inconvenience imposed by the Great Firewall but also adapt to an environment with fewer potential familiar users and with different social norms.
The number of Twitter users in China is unknown. Even the platform operator itself will not have an accurate estimate of the number of users that physically reside in mainland China. Since Twitter users in China access the service through proxies or third party applications, it is not possible to accurately determine location for many Chinese users. There are several ways to infer location of users. In the user profile, there are location and time zone options, though no way to verify whether users disclose their true location. It may be possible to pinpoint a mobile user's location if the user allows Twitter or a third-party application to record and report the user's geographic coordinates. B We put the available location information to use later in the paper. We do not, however, attempt to estimate the number of active Twitter users in China. The available evidence suggests that active Twitter users comprise a small proportion of Internet users, but are also of sufficient number to maintain a vibrant online space for sharing ideas and information and creating content on online social networks. C Activity on Twitter may serve different functions and denote different social processes. 7 It is also likely that there is substantial variation in the social meaning within each of these behaviors. A retweet or mention in one context might mean something entirely different in another context. Golder and Yardi assert that the "substantive nature of the social tie on Twitter is attention-based," which may cover a range of information seeking activities, whether focused on news, entertainment, or professional information. 8 Following another user may also be driven more by social motives: to stay in touch with existing friends, to find new friends, or to participate in a community of likeminded individuals. In their study of Twitter activity, Java et al. broke down activity into four categories: daily chatter, conversations, information sharing, and reporting news. 9 They divide users into three groups: information seekers, information sources, and friends, which span the range of information seeking and social activity. There is no clear consensus yet on how to interpret what it means to follow someone on Twitter, 10 which is unsurprising given the multiplicity of motives for forming ties on Twitter. Yet few doubt that these ties on Twitter are of social significance.
A In this paper, we use the term "Great Firewall" as term of convenience. In this context, it is meant to represent the wide range of Internet content restrictions that apply to users that reside within mainland China. B For a detailed explanation of how we assessed user location, see Appendix 1. C There have been wildly disparate estimates of the number of Twitter users based in China, with one market research firm reporting 35 million Twitter users in China. Alternative, more credible estimates place the number in the tens of thousands. See Jason Q. Ng, "There are NOT millions of Twitter users in China: Supporting @ooof's result and refuting GWI's conclusion," Blocked on Weibo, January 6, 2013, http://blockedonweibo.tumblr.com/post/39828699303/there-are-not-millions-of-twitter-users-in-china.
An interesting question is why Chinese netizens use Twitter at all. We expect to find users sharing information and ideas not permitted on more regulated platforms. We might also expect to find users that are interested in engaging with international peers and audiences. Given the inconvenience of using circumvention tools, we anticipate finding tech-savvy users. And given the risks of engaging in conversations on topics that are censored in China, we might also expect to find a high proportion of anonymous or pseudonymous users.
In the following sections, we describe the structure and content in this enclave of less regulated discourse and frame some of the many questions that emerge related to the reach and impact of this online forum.

METHODOLOGY NODE SELECTION AND NETWORK MAPPING
The analytical basis for this report is a mixed methods research protocol: qualitative content analysis of Chinese Twitter supported by algorithmically drawn network maps and a diverse set of quantitative metrics calculated for each of the clusters in the network. The approach builds upon prior efforts to map online discussion spaces with boundaries drawn that correspond to physical geography. 11 We generated a social network map of Chinese Twitter accounts based on the relationships between the users (see Figure 1). The network structure is visualized using a physics model layout algorithm (Fruchterman-Rheingold). The resulting network map and structures that emerge reflect the individual decisions of Twitter users to follow other users.
In the visualization, each node represents a single user account, and the size of each node reflects the number of followers from the network to that account. The location of each node relative to the others is based on the collective follow decisions of all of the nodes in the network. The network mapping algorithm is driven by two counteracting forces. One force repels, acting to separate all of the nodes from one another. A second force pulls together accounts that are linked by follow relationships and have followers in common as though by a spring or force of gravity. Thus, densely interconnected network neighborhoods "bunch up" in the map. In this way, one can think of the map as a picture of the pattern of influence and information flow in the network. This location of nodes on the map-based on follow relationships-reflects long-term stable relationships. A mapping algorithm based on mentions and retweets would produce a map with more emphasis on shorter term interests and influenced more by the content of tweets within the period in which data are collected. Mapping Twitter in China INTERNET

MONITOR FIGURE 1. NETWORK MAP OF CHINESE TWITTER ACCOUNTS
Determining the accounts that populate the map is based on a multistage process for collecting relevant accounts and removing nodes that are less connected with the core network. The mapping starts with a seed set of approximately 150 accounts compiled by researchers. This set was expanded by adding all of the followers of the seeds, and then reduced by removing all accounts with no activity over the prior 90 days (February through April 2013). After producing a preliminary map based on these accounts, each of the clusters was evaluated for relevance based on participation of and interaction with users from mainland China. For example, clusters entirely comprised of users from Taiwan with no ties to mainland users were given low relevancy scores, while clusters comprised largely of users in mainland China were assigned high relevancy scores. These cluster weights were used to iterate the map, bringing in more nodes highly connected to members of positively weighted clusters and excluding those more connected to negatively weighted clusters. This has the effect of "centering" the map on the intended scope (mainland China), while removing accounts from Taiwan, expatriates, and other non-mainland Chinese accounts.
The resulting network map is comprised primarily of users in mainland China but is not limited to users there. International users are included if they are significantly connected with the core network of users. This includes some accounts that have large followings among mainland Chinese users, even if they follow few or no users in China, and international users that both follow and are followed by users in China. For example, users that are connected with Taiwanese user communities Mapping Twitter in China INTERNET MONITOR are only included in the map if they are also highly connected with this network of users centered on mainland Chinese users.
The final map includes a total of 8,275 accounts, constituting a set that is rich in detail and resolution while being of a tractable size for quantitative and qualitative analysis. As described in Appendix 1, we are able to infer location for 84% of the users, and of those, 73% are within mainland China.
This approach to gathering accounts is in effect a snowball sample. However, the high density of Twitter networks and the use of automated discovery tools eliminates most if not all of the potential starting point biases. It is possible that there are smaller isolated enclaves of users that have few or no ties to these larger communities and were therefore not discovered by the automated spidering process. However, it is unlikely that we have missed relevant clusters of users from our target population: Twitter users in China that are openly engaged with others on social and political issues. While the study of smaller more isolated communities would be of interest in its own right, these communities are arguably not key participants in the networked public sphere and out of the scope of this study.

DEFINING CLUSTERS
The resulting map is overlaid with colors representing each account's assignment to a group based on a clustering algorithm ( Figure 2). The assignment to different color-coded clusters is based on the commonality in outward attention. Accounts that use the same hashtags, use similar language, cite the same URLs, and follow the same accounts are drawn together into clusters. Qualitative human judgment is subsequently used to interpret the results, but not to generate the maps themselves. A variety of metrics are calculated to help researchers understand and describe the contours and structures of the map. These metrics provide a quantitative measure of which activities occur proportionately more often in each cluster compared to the others, including: • Twitter accounts followed • Twitter accounts mentioned, retweeted, and replied to within a timeframe • URLs cited in tweets • Words used, including bigrams and trigrams (pairs and triples) • Countries, cities, and other locations mentioned in user profiles Researchers apply labels to each of the clusters based on a review of these data. These labels are not meant to categorize each account within a given cluster but rather offer concise shorthand descriptions for the various clusters. As we will describe in the next section, the clusters vary by the interests and perspectives of the users in each clusters. There is, however, overlap between some of the clusters in the topics that they discuss. Each user offers a unique perspective that does not necessarily map perfectly with the set of common themes found in each cluster. For many of the accounts within a cluster, the cluster label offers a good summary of the primary focus and general orientation of the account; for other accounts, the cluster label may not strongly capture the interests and views of the user.
The final map of 8,275 users-either residing in China or strongly connected with Chinese residents-consists of 36 clusters oriented around three main topics: politics and human rights; technology; and culture and entertainment (see Figure 3). The political group consists of nine clusters and 2,334 users, the technology group comprises 2,029 users divided into nine clusters, and the entertainment group, made of eight clusters, contains 1,394 users. A fourth set of ten clusters with a total of 2,518 users spans a broader number of interests and includes bloggers, users based in Hong Kong and Taiwan, and groups that are oriented around celebrities.

NETWORK DESCRIPTION POLITICAL CLUSTERS
Nine distinct clusters emerge in the network that are primarily focused on politics and news. The popular topics and political orientation of discussions in these clusters all gravitate toward issues related to dissidents, pro-democracy movements, human rights, and the concerns of activists. There is substantial overlap in interests among many of the political clusters.
The Human Rights Activists & Lawyers cluster mainly follows and interacts with activists, e.g., Ai Weiwei, Chen Yunfei, and Zeng Jinyan; D,E and human rights lawyers, scholars, and writers, e.g., Zhang Lifan, Teng Biao, and Mo Zhixu. During data collection in 2013, many of the users in this cluster lived in China. One of the causes taken up by users in this cluster was demanding an investigation into the student casualties in the Sichuan earthquake. The hashtag #512birthday was created to memorialize the young victims from the earthquake, which occurred on May 12, 2008. Another popular hashtag, (fake case), was used to rebut the tax evasion charges against Ai Weiwei, which Ai and his supporters believe were made in retaliation for his efforts to reveal the government's role in the high death toll during the earthquake. Although the Chinese government admitted shoddy construction might have contributed to the deaths of thousands of young students, 12 it meanwhile suppressed pleas from citizens and media to investigate possible corruption or negligence of local officials by attempting to pay bereaved parents to stay silent 13 and detaining domestic and foreign investigators. 14 Another cluster, Ai Weiwei Followers, follows and interacts with a similar set of accounts, e.g., Chen Yunfei, Wang Lihong, etc. They pay particular attention to Ai Weiwei, who receives the most mentions, retweets, and replies in this cluster. The members also frequently use #aiflower as a show of support for Ai. The hashtag refers to flowers that Ai has been placing in a bike basket in front of his house since being put under house arrest on November 30, 2013.
The Human Rights Advocates cluster includes human rights advocates from diverse backgrounds. Activists, lawyers, writers, filmmakers, and scholars, many of whom live outside China, appear in this cluster. They pay attention to Chinese media operated overseas, e.g., (Boxun), (Canyu), (Rights Protection), and a (Civil Rights & Livelihood Watch). Among the popular hashtags are those that refer to Zhang Anni, a 10-year-old girl who was denied permission to attend two schools because her father was a dissident, and to Sun Zhigang, a young man who was sent to a detention center because he carried no documents and was later found dead there.
The China Bridge cluster serves as a bridge between China and the outside world thanks to two groups of accounts: Chinese people with overseas connections and foreign correspondents, scholars, and business people who are living or used to live in China and are able to understand the Chinese language. This includes, for example, Hu Jia, a Chinese activist who has been recognized and awarded by foreign organizations; Qian Gang, a well-known journalist residing in Hong Kong; Isaac Mao, a Chinese technologist and investor, who has served as a board member to the Tor Project, an D In this paper, we use Pinyin, a standard phonetic system from mainland China, to Romanize Chinese names with a family name preceding given names. For users who refer to themselves with a Romanized version of their name, we follow their usage. In some cases, this means that their given name precedes the family name. E The names cited in this study are either prominent public figures or cited with the consent of the users. If we were unable to obtain consent for less prominent users, i.e., those not cited in major media, we do not report their names and user accounts. Mapping Twitter in China INTERNET MONITOR advisor to Global Voices Online, and a Fellow at the Berkman Center; Jeremy Goldkorn, a South African living in Beijing who runs Danwei, a China-focused blog; Melissa Chan, a former China correspondent for Al Jazeera, who was recently denied entry to China; Rebecca MacKinnon, a former CNN Beijing Bureau Chief who currently works with the New America Foundation's Open Technology Institute; Perry Link, a prominent sinologist whose entry to China has been denied since 1996 due to his involvement with the Tiananmen Papers; and ChinaFile, an online magazine focused on US-China relations.
Users in the Citizen Journalists cluster live both inside and outside China and are active in civic causes. For example, Zhu Chengzhi is a businessman turned activist; Suofeng Zhou has been in exile since 1989 and is the co-founder of Humanitarian China; Yaxue Cao is the founder and editor of ChinaChange.org. Users in this cluster mainly follow Chinese media operated overseas, e.g., (Boxun), (Canyu), (Conscience China), and (Rights Protection). They often use hashtags that refer to Liu Xia, the wife of detained Nobel Peace Prize winner Liu Xiaobo.
The Dissidents & Reformists cluster follows and interacts with people in exile involved with the Tiananmen Square Protests in 1989, including Bob Fu and Wang Juntao, and other exiles that left China before or after 1989 due to their advocacy of political reforms, e.g., Hu Ping, He Qinglian, and Yu Jie. There are also ties to dissidents and reformists remaining in China, e.g., Woeser, a Tibetan writer, and Wang Lihong, a human rights advocate. Common hashtags used in this cluster relate to democracy and advocacy efforts directed at freeing detained dissidents. The links shared in this cluster often lead to foreign media, e.g., VOA, RFI, Radio Free Asia, and Chinese media operated overseas, e.g., (Boxun), (Canyu), Epoch Times, Human Rights in China, and 64 Tianwang.
The attention of the Journalists & Writers cluster is directed at famous journalists, writers and bloggers, including, for example, Xiao Shu, who is a historian who was prohibited from teaching for seven years since the Tiananmen Square Protests in 1989. His books were banned in 1999, he was forced from his position as editor-in-chief in 2005, his column was withdrawn in 2011, and his Weibo Sina account was deleted in 2012. Compared to other clusters, users in this cluster tend to employ more sophisticated language and sarcasm in their tweets, including concepts and terms that are not widely discussed by average users, e.g., (Foucault must kneel), #democraticdev, and #h7n9 (a reference to an avian influenza virus).
The Free China & Free Tibet cluster distinguishes itself by sharing links and using hashtags that commonly relate to the Jasmine Revolution and Tibetan exiles. The first group includes Jasmine Action, Sound of Hope, China Spring of Freedom, Secret China, and China Democracy Party, while the second includes Tibet on the Map, Save Tibet, High Peaks & Pure Earth, Tibetan Centre for Human Rights and Democracy, and Central Tibetan Administration.
The Political News cluster follows and interacts with news and information aggregators, e.g., (Twitter Watch), v (anecdote collection), (Boxun), Invisible Tibet, and Free Weibo, a full Weibo archive that includes deleted posts. Hashtags reflect an interest in diverse news topics, e.g., (great dynasties, a phrase that is used to remind Chinese of the glorious history and the bright future), #512birthday (in memory of student victims during the Sichuan earthquake on May 12, 2008), and (scientific use of the Internet, a comical reference to "jumping the wall"). Mapping Twitter in China INTERNET

TECHNOLOGY CLUSTERS
The Software Development cluster pays particular attention to software engineers inside and outside China. Their frequent hashtags are often related to development, e.g., #jquery, #git, #productivity, #osx, and #backend. Compared to other tech clusters, the Open Source Software cluster pays close attention to information related to open source software, as reflected by their frequent hashtags, e.g., #ubuntu, #vim, #linux, #bash, and #firefox. The majority of users in these clusters reveal they are located in Beijing.
The Tech Gurus cluster includes popular figures in the tech world, e.g., Feng Dahui, a prominent programmer and CTO of an IT firm, and Xi Qiao, author of the comic series Mysterious Programmers. The links they share are related to development, e.g., GitHub, Ruby China, Stack Overflow, CSDN (China Software Development Network), and SlideShare. The Tech Entrepreneurs cluster devotes attention to technology and entrepreneurship. The top users they follow and interact with include famous names in the field, e.g., Tim O'Reilly, Om Malik, and Robert Scoble. The links they share are from Kickstarter, Wired, and WSJ Technology, and the top hashtags include #sxsw, #wmc (World Mobile Congress), and #bigdata. Many members of this cluster are located in the Bay Area and New York.
The focus of the Tech & Media cluster is oriented toward media-related accounts, e.g., @guao, a news feed focused on Google products, and Jason Ng, a tech entrepreneur and blogger. The majority of the users reveal they are located in Beijing. The members of the Tech & Culture cluster show interest in gadgets and devices, reflected by the frequent hashtags: #gmail, #lastfm, #alipay (the payment service by Alibaba), #siri, #ipadmini, and #ux (user experience). Many of them also follow Jason Ng.
The Apps & Misc Tech cluster pays much attention to Apple products and related apps. During the time of data collection, the members were especially interested in apps related to fitness and weather, reflected in the hashtags, e.g., #maccn, #appcn, #fitstats, #fitbit, and #instaweather. The Internet Freedom & Tech cluster pays attention to tech entrepreneurs (Dash Huang), investors (Kai-Fu Lee), and tech bloggers (@keso). They express their eagerness to jump the Great Firewall with text and images in their profiles and tweets.
This Tech News cluster pays much attention to a tech news aggregator @ifanr, a popular Chinese tech blog. Their top hashtags and links are more fun than technical and occasionally refer to adult content, which is not observed in the other clusters. The majority of the users reveal they are located in Beijing.

ENTERTAINMENT CLUSTERS
Three of the clusters in the entertainment network share a love of anime, comics, and games (ACG), with many links to Japanese websites. Two of these, ACG Shanghai and ACG Guangzhou, are comprised of users from distinct geographic regions; the former include users predominantly from Shanghai and the latter from Guangzhou, although users from Hong Kong and Chongqing are also found in this cluster. The ACG Sharing cluster includes users from across the country, though with similar interests to the other two ACG clusters.
Two clusters, Fun Reads and Curiosities & Humor, tend to pay attention to entertainmentoriented Chinese websites such as (encyclopedia of embarrassment). The latter cluster also shares an interest in ACG; cartoon characters are a popular choice for profile pictures in this Two clusters, Hong Kong Users and Taiwanese Users, are comprised primarily of users from those specific locations, although with connections to the mainland China-based clusters and accounts. There are many other users from these and other locations with fewer connections to this network who are therefore not included in this study. Another two clusters, Japanese Celebrities and US Celebrities, are drawn together by their common interest in celebrity news.
An Online Shopping cluster appears to be comprised mainly of overseas Chinese who frequently tweet in Chinese. This cluster pays attention to e-commerce websites, e.g., Taobao and aihuishou.com (love to recycle/resell). The Foreign Media cluster follows and interacts primarily with English-language foreign media, e.g., AP, BBC, CNN, Huffington Post, New York Times, Washington Post, etc.
The Shanghai Bloggers cluster has many users from Shanghai who are interested in a variety of websites, e.g., Shanghaiist (a popular English website about China), Zhihu (a Chinese Q&A platform), and Douban (a social network gathering people with similar interests in music, film, literature, and other cultural products). Three other clusters include users with broader interests that span news, technology, and entertainment.

TOP USERS
The list of users with the most followers from this network consists primarily of political activists and journalists, but also includes popular figures from the technology and entertainment world.  Han is a professional rally driver and writer, and his blog is among the most widely followed in the world. Because Han openly questions authorities in China, he has been widely admired by younger generations of Chinese and extensively applauded by Western media, helping him to build strong connections with both Greater China and other parts of the world and reach different audiences across social media platforms. A few other top Chinese Twitter users are also active on Sina Weibo, such as @williamlong and @hecaitou, who focus on technology, which makes them less dependent on Twitter. However, a number of top Chinese Twitter users, such as Ai Weiwei and Ran Yunfei, do not have accounts on Weibo because they are political dissidents silenced by authorities in China.

LOCATION OF USERS IN THE NETWORK
As described earlier, the network map is focused on users in mainland China, but is not limited exclusively to users there. Although we are not able to accurately determine where each and every user in the network resides, we can make reasonable inferences about location based on the several sources of location information available. Of particular interest to us is whether users reside within mainland China or elsewhere. H There are two sources of helpful information in the Twitter profile data: user-provided information on location and the user-selected time zone. Each of these fields is provided by users who may wish to mask their location, so we must treat these data with caution. In addition, the metadata for many of the accounts include longitude and latitude coordinates.
Where longitude and latitude data are available, we use that to determine location. This accounts for 17% of the users in the network. If coordinates are not available, we next turn to the user-provided location name. If we able to match the user-provided location name to a known geographic location, we use this to infer location. An additional 56% of the users fall into this category. Among the remaining 27% of users, 17% left the location field blank and 10% provided information that could not be associated with a specific location (e.g., " " (enemy-occupied area), " " (Demolition Office, Pandora Moon), "404 not found", and "Airstrip One, Oceania"). However, for approximately half of these users, we are able to infer from their choice in time zone that they reside within China. Using these various sources of information, we are able to estimate for 84% of the users whether they reside within China or not.   Table 3, the great majority of users in a large majority of the clusters are in mainland China, although there is significant variation across the clusters. For 26 of the 36 clusters, three quarters or more of users appear to live in mainland China. The clusters within the technology and entertainment groups have a high proportion of users within China.

As seen in
At the other side of the spectrum, there are six clusters that comprised primarily of users based outside of China: Foreign Media, Tech Entrepreneurs, Hong Kong users, Taiwanese users, and the US and Japanese celebrity-focused clusters. The users in these clusters tend to have asymmetric relationships with significantly more followers in this network than followees. The inclusion of these clusters in the network is driven by the outward facing interests of users based in mainland China. Comparing the clusters by the location of their followers in the network, there is less variation; between 70 to 90% of followers aggregated across each cluster come from within China. Looking across the groups, the entertainment and technology clusters tend to have a larger percentage of followers who reside in mainland China, compared to the political clusters. Similarly, the location of followees, which represents the external focus of the users in each of the clusters, does not vary greatly, except for the six clusters with a majority of users located outside of China. Again, the political clusters display a higher level of interest and engagement with users outside of China compared to the technology and entertainment clusters. Overall, considering the location of users, their followers, and their followees, it appears that users in this network based in mainland China are well integrated with users from outside the country. Figure 4 shows that the users outside of mainland China (blue) have both more followers and followees from within the network than their counterparts inside China (red). This aligns with the expectation that the inconvenience of overcoming the GFW translates into fewer connections for those that reside in mainland China.

INTERACTIVE CONNECTIONS ACROSS CLUSTERS
Many of the 36 clusters in the network share areas of interest. To gauge the level of separation and common attention between the clusters, we map the clusters around areas of disproportionate attention, including the accounts that are followed, mentioned, replied to, and retweeted. These twomode networks are created using iGraph for R for each of the political, technology, and entertainment groups. 15 Not only do these charts illustrate how the clusters are connected by common attention; they also reflect more dynamic information flows across clusters through retweets, replies, and mentions. We have visualized all the three types of interactions (retweets, replies, and mentions) but present here only the visualizations based on retweets as the results are similar for all three. In the visualizations, edges are between clusters and accounts that are disproportionately retweeted by a cluster; a normalized score was calculated for messages retweeted from each account to measure the level of disproportionate interest from each cluster. The width and color of the edges are proportional to the normalized attention score: higher levels of attention from a cluster to an account is displayed by a wider and deeper edge (see Figure 5). and Tech News were retweeted more internally to these clusters. Quite differently, the entertainment clusters showed a clear divide between ACG lovers and other fun seekers. This gap may be induced by the nature of cultural products, which often arouse strong preferences and avoidances (Webster 2014). 16 Examining hashtags helps to reveal the differences in interests and attention among the different clusters. The political chart shows the affinity between several of the clusters with a focus on dissidents and activists, and a gap between these clusters and the China Bridge and Free China & Free Tibet clusters. The dissident-and activist-focused clusters appealed for the freedom of their fellow members, including Chen Guancheng, Yang Kuang, Chen Yunfei, Xu Wei, and others. During the time of data collection, Liu Xia was at the top of the list, and many hashtags were related to her, including # (Liu Xia), #freelx, #freeliuxia, # (free Liu Xia), and #freeliu. On the other side of chart, the China Bridge cluster paid more attention to issues and regions rather than individuals, such as #sichuanquake, #xinjiang, #china, #pollution, #saudi, #northkorea, #southkorea, #thatcher, #senkaku, #nkorea, #dprk, #chine, #japan, #uyghur, #asia, #npc, #fdi, #sichaun, #nra, #birdflu, #nuclear, #eu, #southafrica, #us, #pyongyang, #nkorean, #sichuan, #dalailama, #india, #tibet, #syria, #quake, #environment, #tibetans, #afp, #weibo, #askeconomist, #xi, #iraq, #chinese, and #diplomacy. Members of the Free China & Free Tibet cluster talk more about Tibet and the Jasmine Revolution, which is reflected by their hashtag choices: #tibetonthemap, #tibetan, #tibet, #chinese, #tsamparevolution, #fujian, #tibetans, #gyama (Gyama Valley, where mine disasters occurred), # (anti-Japan), #molihua (jasmine), #cnjasmine, #netdog (contemptuous name for Internet commentators), #lhasa, #hongkong, and #woeser (a Tibetan writer). The entertainment clusters also appear divided between ACG lovers and other users, and the hashtags reflect their distinct interests in ACG and other fun activities. Mapping Twitter in China

INTERNET MONITOR
Many of the discussions on Twitter would have been prohibited by domestic social media. In particular, the most relevant political hashtags are commonly censored in China, including 1989, Tibet, freedom, democracy, national security ( ), and the names of both government officials and dissidents, such as Ai Weiwei, Liu Xia, and Xi Jinping. The hashtags most frequently retweeted by the technology clusters are less sensitive, including #backend, # (mobile social network), #iosdev, #galaxysiv, #iwatch, and #nerds. The entertainment hashtags tended to be entertaining and ironic, such as # (good morning, tweeting gods and goddesses), #( (my son is a weirdo), #/ (walking encyclopedia of adult videos), and # (happy to father a child).

PROFILE PICTURES AND IDENTITY MANAGEMENT OF USERS
The fact that Twitter is outside of the reach of Chinese authorities does not mean that China does not seek to shape behavior on the platform. Some Chinese netizens have been arrested because of activity on Twitter. In 2012, Zhai Xiaobing published a political satire on Twitter before the 18th National Congress of the CPC and was charged with spreading terrorist rumors; 17 in 2014, Zhao Huaxu proposed on Twitter a technical idea for sharing knowledge about the Tiananmen Square Protest and was later arrested on charges of spreading illegal information. 18 Under such circumstances, we wondered whether Chinese users would prefer to hide their identities on Twitter to avoid risks. Because it is common for Internet users to adopt a nickname and be recognized by this handle, it is difficult to draw conclusions about a user's intention to remain anonymous solely by use of a pseudonym. Instead, we took profile pictures as a proxy for preferences regarding anonymity. Our assumption was that users' willingness to expose their identities would be correlated with the use of their photos in their profiles.
To answer this question, we retrieved user information of the 8,275 accounts in the sample through the Twitter API, downloaded the profile pictures of the existing users (about 1% of the accounts had been closed), and fed them into Face++, an online service specializing in detecting features of human faces, including gender, age, and race. 19 Face++ appeared to have identified real faces fairly well and accurately labeled cartoon characters and artistic portrayals as non-faces. Among the currently active users, Face++ recognized 28.4% as having one or more faces in their profile pictures, and found the remaining 71.6% to have no face at all. Among the profile pictures with faces, Face++ labeled 29.8% of them as female and the remaining 70.2% as male, and identified 66.3% as Asian, 3.0% as black, and 30.8% as white.
The automated approach to estimating use of real faces has a few limitations. First, Face++ missed some real faces when the photos were taken at an abnormal angle or heavily retouched. Another downside of the automatic processing is that we have no way to know when an account is using someone else's photo, e.g., a photo of a pop star. To assess the validity of the automated detection results, we manually checked a random sample of accounts for the presence of faces that appear to be of the account holder. Based on 367 cases drawn from the political cluster, we found that 25% of users included what appears to be their real face, which is close to the results of the automated assessment. For the manual check, we eliminated accounts that had no face, accounts with the face of a child, and accounts with the face of an easily recognized celebrity. For adult faces, we also conducted a Google Images search to look for matches that would indicate that a user had placed someone else's face in their profile. I Based on the automated face detection results (see Figure 6), the users in the political group appear to show their real faces more than users in the technology and entertainment groups, instead of voicing opinions behind a mask. This result, which contradicts our expectations, is perhaps because openly voicing opinions with a disclosed identity is a calculated strategy to increase the impact and reach of advocacy efforts in full recognition of the risks involved. Celebrities and tech gurus inside and outside China also tended to publicize their faces, presumably to enhance their popularity. Interestingly, the entertainment group displayed real faces the least, even though they arguably bear a lower risk for their activities on Twitter. This phenomenon may come from the subculture among

SOCIAL NETWORK STRUCTURE NETWORK STRUCTURE
In this section, we explore whether there are systematic differences in the structures of the various clusters. As described by Smith et al.: "Conversations on Twitter create networks with identifiable contours as people reply to and mention one another in their tweets. These conversational structures differ, depending on the subject and the people driving the conversation." 20 In the analysis that follows, we seek to describe in preliminary terms whether the structure of the clusters is correlated with the topical focus of these communities and offers insights into the functional relationships among users.
Twitter allows three possible follow relationships between each pair of users: no connection, onedirectional connection, and mutual connection. Based on these follow relationships found in each of the clusters, we calculate three metrics to help interpret the differences in the structure of these clusters: density, mutuality tendency, and degree centralization. J Table 4 lists these descriptive statistics for each of the 36 clusters in the network, along with the number of users and connections within each cluster. The metrics are based solely on follow behavior and therefore denote longerterm static relationships and do not fully reflect users' interactions with others through retweets, replies, and mentions. J In Appendix 2, we show a fourth measure, indegree centrality. Density is a measure of the overall level of connectivity among nodes in a network. This metric is calculated by the number of connections between users in a cluster as a proportion of the total possible number of connections. K

INTERNET MONITOR
Density varies substantially across the different clusters (Figure 7). Generally speaking, if people get online to seek information and entertainment or to create and solidify social ties, we imagine those who seek out several of these purposes may use the platform more frequently and consequently form denser networks. Among the four groups (politics, technology, entertainment, and mixed), the entertainment group has the highest density scores, followed by the political and tech groups. The clusters with the highest density scores in each group are the Human Rights Advocates and Citizen Journalists clusters in the political group, Open Source Software and Tech Gurus in the technology group, ACG Shanghai and Shanghai Hipsters in the entertainment group, and Hong Kong Users and Shanghai Bloggers in the mixed group. One plausible explanation is that these groups are populated by more engaged users that seek out more information and are more motivated to network with likeminded people. It could be that some of the clusters have stronger commonality of interests than others, or some topics could attract more engaged users. Another possible explanation is that a higher proportion of members of these clusters know each other personally.
There are many clusters at the opposite end of the scale with low density levels (e.g., Dissidents & Reformists, Journalists & Writers, Software Development, Tech Entrepreneurs, Curiosities & Humor, Fun Reads, and News, Tech & Entertainment). The variation in density scores is higher within the four groups (politics, technology, entertainment and mixed) than the variation across the four groups, suggesting that the set of factors that influence density in these clusters applies to all of the topics.
The indegree centralization scores measure the extent to which connections are evenly divided across the nodes in each cluster or are more highly directed at a small number of nodes. A high centralization score means that a small number of prominent nodes are the focus of attention. Low density scores combined with high centralization scores indicate that some of these clusters were formed around star writers, activists, and technologists, as seen in the Journalists & Writers, Tech & Media, Human Rights Activists & Lawyers, Dissidents & Reformists, and Curiosities and Humor clusters. The members of these clusters may be satisfied to receive information from the hub users and less motivated to build ties with others in the cluster. In the case of the Tech & Media cluster, the low density measures may be explained by the fact that the top users in this cluster, @hecaitou and @williamlong, are also active on Sina Weibo and have attracted about 400,000 followers on that platform. This cluster on Twitter could be a shadow of a more active conversation on Weibo. The two top users mainly discuss technology, which is acceptable in China, thereby reducing the need for an alternative, less regulated platform.
The Dissidents & Reformists cluster also had an extremely low density score, suggesting the users in this cluster either lack interest in forming connections or are deterred from doing so. If nothing else, this suggests that the Great Firewall is effective at isolating dissidents from each other and from those outside their circles. The low level of connectivity could reflect low public attention Chinese dissidents receive because human rights violations and dissidence are rarely reported by Chinese media. Although the New York Times (NYT) and the Hong Kong-based South China Morning Post (SCMP) cover these issues much more than the China Daily, 21 non-Chinese media may fail to boost the fame of human rights activists among mainland Chinese for two reasons, implied by Krumbein's findings. First, the SCMP and NYT mainly cover Chinese issues around major events, such as the US President's visit to China and the Beijing Olympics. Second, they pay repeated attention to the Tiananmen Square anniversary crackdowns and provide less coverage for other events. In addition,

Beyond the Wall Mapping Twitter in China
INTERNET MONITOR the language barrier hinders Chinese speakers from learning about dissidents from their own country.

RECIPROCITY AND EQUALITY
Since Twitter allows asymmetric connections across users, mutuality (reciprocity) may reflect different underlying social dynamics. In particular, high reciprocal connections may indicate familiarity, comparable prominence, or the exchange of information around shared interests. For example, friends make mutual connections on Twitter, whereas entertainers, politicians, and tech gurus rarely reciprocate the connections with their fans, supporters, or admirers. Homophily and equality appear to be important aspects in forming reciprocal relationships on Twitter. Weng et al. find that a large portion of reciprocal ties on Twitter might be explained by common interests. 22 Wu et al. find significant homophily within different categories of Twitter users with "celebrities following celebrities, media following media, and bloggers following bloggers." 23 Kwak et al. show that the number of followers in reciprocal follow relationships ("r-friends") are correlated, implying a measure of equality in mutual relationships on Twitter. 24 Golder and Yardi say that "mutuality might be a proxy for equal status." 25 As a measure of reciprocity, we calculated mutuality tendency, which measures the conditional probability of a reciprocal follow relationship for each follow relationship. 26 Among the four groups, entertainment had a higher tendency towards mutuality compared to the political and tech groups ( Figure 8). The high tendency might indicate that more of these people are friends, or could reflect the exchange of media and information, such as animation, comics, games, and fun activities, or that the users tended to share more similar levels of prominence in their clusters. Within the tech group, Tech Gurus and Tech Entrepreneurs were less likely to follow back, whereas the Tech News and Software Developers clusters were more likely to make mutual connections, perhaps because they sought each other's technical and emotional support and relied on each other for the latest tech updates. In the same light, among the political clusters, Free China & Free Tibet advocates and Political News followers had the highest reciprocity score, perhaps also driven by information and support seeking. The least reciprocal clusters include Dissidents & Reformists, Tech & Media, and Journalists & Writers, which feature high-profile dissidents, tech critics, and famous writers, respectively. These top users had more followers than followees, suggesting these clusters were highly biased toward the "stars." Mapping Twitter in China INTERNET MONITOR We find considerable variation in network structure across the various clusters, which likely reflects different informational and behavioral forms. These different structural forms are found whether discussing politics, technology, or entertainment. In Chinese Twitter, topical coverage does not determine network structure. There are, however, broad differences between the clusters, as shown in Tables 5 and 6. The clusters in the entertainment group are more highly connected and more likely to form reciprocal ties. The political and technology groups are comparatively less dense and more centralized, and their users less likely to form reciprocal ties. Several of our expectations prior to carrying out this research are confirmed by the analysis. We find a major focus of Chinese Twitter to be on politically contentious topics that would be blocked on domestically hosted platforms. The presence of these groups is consistent with the commonly stated proposition that there is a small group of highly motivated political activists who are willing to take the necessary steps to get around Internet censorship.
The political crowd, who would face the highest risks for their online speech, appear to be the least likely to seek anonymity. The fact that many of these advocates openly tweet under their real names and include their face in their profiles suggests a willingness to risk their personal freedom, or alternatively a belief that international recognition may help to shield them from persecution.
It is not surprising to find groups of technologists on Chinese Twitter. This is consistent with the idea that it is relatively simple to circumvent the firewall for those that are technologically adept. The presence of several clusters that focus much of their attention on culture and entertainment would have been harder to predict.
In some ways, the discourse in the politically engaged portions of Chinese Twitter suggests that this is indeed an alternative public sphere in which networked individuals can cooperate in a peer-produced system that collectively filters and highlights areas of public concern with less reliance on large media organizations and under less government control, allowing a shift towards more bottomup agenda setting and framing of issues. 27 However, the seemingly sparse connections between Chinese dissidents and their compatriots may be the result of limited domestic media coverage and a language barrier between users and Western media outlets.
Chinese Twitter falls well short of supporting an inclusive and broadly accessible networked public sphere. The proportion of the Chinese populace with direct access to the debates, communities, and shared resources on Twitter is very small, and the avenues by which such discourse might find its way into mainstream political discussion are severely constrained. The firewall between Twitter and the much larger social media platforms in China appears to form a formidable barrier. Future research may shed further light on the effectiveness of indirect methods of information diffusion across these separate networks. One possibility is word of mouth. Another is the injection of perspectives, topics, and frames from Twitter into Weibo, although any such transfer would have to survive the domestic filters.
Tracking the evolution of Chinese Twitter over time may help to answer some of the outstanding questions. One possible trajectory would be the addition of new users interested enough in more open discussion of political issues to surmount the obstacles of the Great Firewall. It is also easy to imagine a growing number of technologically proficient Chinese netizens joining international social media platforms. Another complementary path would be the addition of more users to Twitter and similar platforms that are neither among the most highly engaged politically nor among those interested in technology, which would cover a much wider cohort of Internet users. It may be instructive to learn more about the motives for users in the culture and entertainment groups to join Twitter.
There are several areas where future research will help us to better understand the impact and reach of such Internet enclaves set apart from their larger media spaces by technological and regulatory barriers. More theoretical and empirical work is needed to understand the relationship between the structural form of these networks and the functional aspects of communication and community building via digital platforms. One limitation of this study is the difficulty in interpreting how the characteristics of these digital networks may or may not reflect a growing and vibrant public sphere. Studying the evolution of these networks over time will provide some answers. Comparing the form and function of Chinese Twitter to other similar networks in different regulatory contexts may be fruitful as well. Another useful extension of this work would be to conduct interviews with participants to help to bridge the gap between the digital record and user perceptions on their motivations, intent, and purpose of Twitter activity. Mapping Twitter in China INTERNET MONITOR

APPENDIX 1: ASSESSING USER LOCATION
We cannot establish with certainty the location of Twitter users. There are, however, several sources of information that help us to assessing the likely location of the users in the network. The metadata for many of the accounts include longitude and latitude coordinates. We also have access to usersupplied information in three fields of Twitter profiles: language, time zone, and location. Since each of these fields is user defined, we do not expect them to be fully reliable and treat these data with caution.

INTERFACE LANGUAGE
Users can select the interface language on Twitter. In addition to choosing between English, Spanish, or Chinese, for instance, Chinese can be specified as simplified or traditional, the two major Chinese systems exclusively adopted by various countries in the world. Users of simplified Chinese are more likely to come from mainland China, whereas those of traditional Chinese are more likely to come from Taiwan or Hong Kong. In our sample, however, the mostly frequently chosen language is English, rather than Chinese of either type. English is the default option, and many users might not find it necessary to change this, even if they still tweet mostly in Chinese. Even though language is not a precise proxy for mapping users' locations, we found that language preference was somewhat correlated with users' interests. M Figure 9 below shows that the users in the political group are more likely to configure the interface as Chinese, while English is more common in the technology group.  Another reason why some Chinese users chose a non-Chinese time zone is because they apparently want to distance themselves from the country. For example, a user explained that she always uses Hong Kong as the time zone because a candlelight vigil is held there every year as a memorial to the Tiananmen Square Protest (Figure 13). Chinese users who choose a non-Chinese time zone often find Hong Kong and Taipei more convenient than others because these are consistent with Beijing, meaning the user's experience on Twitter is not altered. Mapping Twitter in China INTERNET MONITOR

USER REPORTED LOCATION
Although language and time zone fail to provide reliable information about user locations, the "location" field on the user profile provided better insights. The location data, which were retrieved through the Twitter API, generally fell into one of five categories: known places, recognizable nicknames for certain places, coordinates, unrecognizable places, and blank. First, many users set their locations with known places written in various languages, such as Boston, (Beijing/Peking), and (Seoul). Occasionally, users listed multiple known places in this field, such as Nanjing/New York. In this case, we assigned the first value to the given user. We then converted these known places to coordinates using the GeoNames API. O When an upper level region was given as a location, a lower-level place (mostly capitals or populous places) was returned by the GeoNames, e.g., US converted to New York, China to Beijing, and California to Los Angeles. For this reason, Beijing and New York were overrepresented because they were used to represent users living in China or the US who did not specify a location at a lower level.
The second category consisted of places with a known nickname. For instance, China was nicknamed as (West Korea), , # (Heaven Dynasty, a historic and sinocentric way for a Chinese dynasty to refer to itself, and a term now used as satire by Chinese netizens), "Ceramic Country," "Country of Harmony," and " " (which resembles the pronunciation of Chi-Na and means "demolish where" in Chinese). In addition, Beijing, Shanghai, and Guangzhou, the largest O GeoNames, "About GeoNames," http://www.geonames.org/about.html. Mapping Twitter in China INTERNET MONITOR cities in China, are often referred as (imperial capital), (magic capital), and (yōkai capital). Other "capitals" included (pseudo-capital, Wuhan), (old capital, Nanjing), and (abandoned capital, Xi'an). These nicknames were manually translated to standard geonames in English and automatically converted to coordinates using the GeoNames API.

LONGITUDE AND LATITUDE COORDINATES
The fourth source of location information for some accounts is longitude and latitude coordinates which was included as part of the metadata. Some of these coordinates included a prefix such as ÜT or iPhone, which we suspect are fingerprints left by Über Twitter or other third-party applications. After the prefixes were removed, these coordinates could be used directly to map users. The dataset for this study included longitude and latitude coordinates for 1,378 users, which accounts for 16.7% of the sample.
Another 10.4% of the users reported unrecognizable geonames. Among them, some were metaphorical locations, e.g., # (inside/outside the [Great Fire] Wall) and "404 not found"; some were fictitious places, e.g., "Gotham City," "Airstrip One, Oceania," and "Oz"; some were statements, e.g., "can't tell you because I may be chased after." The remaining 16.7% of the users left the location field blank.

MAPPING GROUPS AND CLUSTERS
After we normalized the locations and converted them into longitude and latitude, we created heatmaps with the help of Google Fusion Tables to illustrate users' geographic distribution ( Figure  14). P Heatmaps better illustrate the density of users, and when weighted on follower count, influence can be observed geographically. Overall, most users in the sample were from Beijing/China, Shanghai, and Tokyo. However, if weighted by follower count, we found more influential users were from Tokyo and the US coasts, which indicates the "super nodes" in China had many fewer followers compared to their counterparts elsewhere because Twitter does not have a mass audience in the country.
Twitter users in the political, cultural, and technology groups were mostly inside China, whereas those in the mixed group were mostly from outside the country. On the cluster level, some showed tight geographic distribution, e.g., the Hong Kong Users and Taiwanese Users clusters, like various clusters in the Arabic blogosphere (Etling et al. 2010). There are various reasons to explain why geographic proximity helps cluster people. First, people living in the same area have more chances to meet in person and build relationships. Second, geographic clustering represents other similarities among people, such as cultural background and socioeconomic status, and therefore facilitates connections among people. The Tech Entrepreneurs cluster had its leading accounts based in San Francisco and New York, and the ACG Guangzhou cluster had influential users from Guangzhou, where the ACG industry prospers. Mapping Twitter in China INTERNET

INSIDE OR OUTSIDE THE GREAT FIREWALL
As described earlier, we used these data to assess whether users are located within or outside of the Great Firewall. For this broader scale distinction, we also draw on time zone information, in addition to the user-provided location data and geographic coordinates. Drawing on these data, we are able to infer location for a larger portion of the users in the network, including some with conflicting information or dubious reliability. In using time zone information, we make two key assumptions: 1) Although some Chinese people set their time zones outside the country for ideological or technical reasons, we assumed that users outside China would not configure their time zones to locations within mainland China unless they lived there, and postulated that people with a Chinese time zone (Beijing, Chongqing, and Urumqi) were in fact residing in China. 2) If the Alaska time zone was chosen, but other information indicated a Chinese user, we assumed that access was from inside China. When these two rules failed to apply, we marked the data with missing values.