Analyzing Accessibility of Wikipedia Projects Around the World May 2017 Analyzing Accessibility of Wikipedia Projects Around the World Justin Clark Robert Faris Rebekah Heacock Jones INTERNET MONITOR is a research project to evaluate, describe, and summarize the means, mechanisms, and extent of Internet content controls and Internet activity around the world. thenetmonitor.org INTERNET MONITOR is a project of the Berkman Center for Internet & Society. http://cyber.harvard.edu 23 Everett Street • Second Floor • Cambridge, Massachusetts 02138 +1 617.495.7547 • +1 617.495.7641 (fax) • http://cyber.harvard.edu • hello@cyber.harvard.edu http://cyber.harvard.edu/ mailto:hello@cyber.harvard.edu INTERNET MONITOR ABSTRACT This study, conducted by the Internet Monitor project at the Berkman Klein Center for Internet & Society, analyzes the scope of government-sponsored censorship of Wikimedia sites around the world. The study finds that, as of June 2016, China was likely censoring the Chinese language Wikipedia project, and Thailand and Uzbekistan were likely interfering intermittently with specific language projects of Wikipedia as well. However, considering the widespread use of filtering technologies and the vast coverage of Wikipedia, our study finds that, as of June 2016, there was relatively little censorship of Wikipedia globally. In fact, our study finds there was less censorship in June 2016 than before Wikipedia’s transition to HTTPS-only content delivery in June 2015. HTTPS prevents censors from seeing which page a user is viewing, which means censors must choose between blocking the entire site and allowing access to all articles. This finding suggests that the shift to HTTPS has been a good one in terms of ensuring accessibility to knowledge. The study identifies and documents the blocking of Wikipedia content using two complementary data collection and analysis strategies: a client-side system that collects data from the perspective of users around the globe and a server-side tool to analyze traffic coming in to Wikipedia servers. Both client- and server-side methods detected events that we consider likely related to censorship, in addition to a large number of suspicious events that remain unexplained. The report features results of our data analysis and insights into the state of access to Wikipedia content in 15 select countries. AUTHORS Justin Clark is a software developer at the Berkman Klein Center for Internet & Society at Harvard University. Most recently, he has been adapting, designing and crafting systems for mapping the contours of information control on the Internet. Robert Faris is the Research Director at the Berkman Klein Center for Internet & Society at Harvard University. His recent research has been focused on developing and applying methods for studying the networked public sphere. Rebekah Heacock Jones is a former senior project manager for the Berkman Klein Center for Internet & Society. ACKNOWLEDGEMENTS The authors would like to thank the following people for their helpful contributions and feedback: Grant Baker, Patrick Drown, Urs Gasser, Casey Tilton, Zhou Zhou, and Jonathan Zittrain. COVER IMAGE ““Greater Chicago Metropolitan Area” Image Credit: (NASA, International Space Station, 02/02/12) https://spaceflight.nasa.gov/gallery/images/station/crew-30/html/iss030e062540.html Analyzing Accessibility of Wikipedia Projects Around the World1 This paper can be downloaded without charge at: The Berkman Center for Internet & Society Research Publication Series: https://cyber.law.harvard.edu/publications/2017/04/WikipediaCensorship The Social Science Research Network Electronic Paper Collection: Available at SSRN: https://ssrn.com/abstract=2951312 Suggested citation: Clark, Justin and Faris, Robert and Jones, Rebekah Heacock, Analyzing Accessibility of Wikipedia Projects Around the World (May 2017). Berkman Klein Center Research Publication Series. Available at SSRN: https://ssrn.com/abstract=2951312 1 This report was supported by the Wikimedia Foundation, and the Berkman Klein Center was selected as the research partner in a project to map the accessibility of Wikipedia around the world. The project is listed in Wikimedia's Research wiki here: https://meta.wikimedia.org/wiki/Research:Analyzing_Accessibility_of_Wikipedia_Projects_Around_the_W orld. https://cyber.harvard.edu/publications/2017/04/WikipediaCensorship https://cyber.harvard.edu/publications/2017/04/WikipediaCensorship https://ssrn.com/abstract=2951312 https://ssrn.com/abstract=2951312 https://meta.wikimedia.org/wiki/Research:Analyzing_Accessibility_of_Wikipedia_Projects_Around_the_World https://meta.wikimedia.org/wiki/Research:Analyzing_Accessibility_of_Wikipedia_Projects_Around_the_World 1 Analyzing Accessibility of Wikipedia Projects Around the World INTERNET MONITOR Table of Contents Introduction Methods Findings By Country China Cuba Egypt Indonesia Iran Kazakhstan Pakistan Russia Saudi Arabia South Korea Syria Thailand Turkey Uzbekistan Vietnam Additional Findings Article-level Analysis Project-level Analysis Client-side Analysis Next Steps and Conclusions Appendix A: Wikipedia Projects Appendix B: Client Test Details Appendix C: Likely Censored Persian Articles Appendix D: Dates With Widespread Anomalies Appendix E: Article Analysis Methods In-Depth 2 Analyzing Accessibility of Wikipedia Projects Around the World INTERNET MONITOR Introduction As one of the largest online repositories of user-generated content in the world, covering topics that range from the general reference 3 to the highly controversial, 4 Wikipedia has repeatedly found itself the target of government censors in countries ranging from China to Iran to Uzbekistan. In some cases, individual articles have been singled out: Turkey has blocked a handful of articles related to reproductive biology, as well as at least one political article; 5 in 2008, a number of ISPs in the United Kingdom blocked access to an article about the German band Scorpion's album, "Virgin Killer," the album art for which was a provocative image of a naked child. 6 In other cases, one or two offending articles have prompted wholesale blocks of the site: Russia has intermittently blocked access to all of Wikipedia out of concerns around articles related to the smoking of marijuana; 7 and in 2006, Pakistan temporarily blocked the site in response to an article on "Draw Mohammed Day," which violated certain religious prohibitions against visual depictions of Mohammed. 8 Syria, 9 China, 10 Iran, 11 Tunisia, 12 and Uzbekistan 13 have all blacklisted the site at various times without publicly citing specific content concerns. A detailed look at the filtering of specific Wikipedia articles can serve as a window into the kinds of content—political, historical, religious, sexual, cultural, drug- or alcohol-related—that trigger censorship in different countries. Censorship of Wikipedia became slightly more complex, however, 3 "Portal:Contents/Categories: General Reference," Wikipedia, https://en.wikipedia.org/wiki/Portal:Contents/Categories#General_reference. 4 "Wikipedia:List of controversial issues," Wikipedia, https://en.wikipedia.org/wiki/Wikipedia:List_of_controversial_issues. 5 "Wikipedia releases warning on Turkey’s censorship, monitoring," Hurriyet Daily News, Jun 19, 2015, http://www.hurriyetdailynews.com/wikipedia-releases-warning-on-turkeys-censorship- monitoring.aspx?pageID=238&nid=84255. 6 Jillian C. York, "UK Blocks Access to Wikipedia Entry on Controversial Scorpions Album," OpenNet Initiative, Dec 9, 2008, https://opennet.net/blog/2008/12/uk-blocks-access-wikipedia-entry-controversial-scorpions-album. 7 "Russian media regulator confirms Wikipedia blacklisted," Russia Beyond the Headlines, Apr 5, 2013, http://rbth.com/news/2013/04/05/russian_media_regulator_confirms_wikipedia_blacklisted_24706.html. Amar Toor, "Russia banned Wikipedia because it couldn’t censor pages," The Verge, Aug 27, 2015, http://www.theverge.com/2015/8/27/9210475/russia-wikipedia-ban-censorship. 8 aacool, "Pakistan Blocks Wikipedia," Blogcritics, Mar 31, 2006, http://blogcritics.org/pakistan-blocks-wikipedia/. 9 "Syrian Youth Break Through Internet Blocks," IWPR, http://www.css.ethz.ch/content/specialinterest/gess/cis/center-for-securities-studies/en/services/digital- library/articles/article.html/88422. 10 "Authorities block access to online encyclopaedia," Reporters Without Borders / IFEX, Oct 21, 2005, http://www.ifex.org/china/2005/10/21/authorities_block_access_to_online/. 11 "New York Times website unblocked, YouTube still inaccessible," Reporters Without Borders, Dec 7, 2006, http://archives.rsf.org/print.php3?id_article=20016. 12 Alice Backer, "Tunisia: Censoring Wikipedia?," Global Voices, Nov 27, 2006, https://globalvoices.org/2006/11/26/tunisia-censoring-wikipedia/. 13 "Uzbekistan Blocks Its Wikipedia," Sputnik News, Feb 17, 2013, http://sputniknews.com/world/20120217/171367528.html. https://en.wikipedia.org/wiki/Portal:Contents/Categories#General_reference https://en.wikipedia.org/wiki/Wikipedia:List_of_controversial_issues http://www.hurriyetdailynews.com/wikipedia-releases-warning-on-turkeys-censorship-monitoring.aspx?pageID=238&nid=84255 http://www.hurriyetdailynews.com/wikipedia-releases-warning-on-turkeys-censorship-monitoring.aspx?pageID=238&nid=84255 https://opennet.net/blog/2008/12/uk-blocks-access-wikipedia-entry-controversial-scorpions-album http://rbth.com/news/2013/04/05/russian_media_regulator_confirms_wikipedia_blacklisted_24706.html http://www.theverge.com/2015/8/27/9210475/russia-wikipedia-ban-censorship http://blogcritics.org/pakistan-blocks-wikipedia/ http://www.css.ethz.ch/content/specialinterest/gess/cis/center-for-securities-studies/en/services/digital-library/articles/article.html/88422 http://www.css.ethz.ch/content/specialinterest/gess/cis/center-for-securities-studies/en/services/digital-library/articles/article.html/88422 http://www.ifex.org/china/2005/10/21/authorities_block_access_to_online/ http://archives.rsf.org/print.php3?id_article=20016 https://globalvoices.org/2006/11/26/tunisia-censoring-wikipedia/ http://sputniknews.com/world/20120217/171367528.html 3 Analyzing Accessibility of Wikipedia Projects Around the World INTERNET MONITOR when the site added HTTPS support across all of its various language projects in October 2011. 14 HTTPS makes blocking specific pages on a domain significantly more difficult by preventing censors from seeing exactly which page on a website is being visited, which means censors who want to prevent users from accessing individual Wikipedia articles must choose between blocking the entire site, including inoffensive articles, and not blocking anything at all. The option to access Wikipedia using either the HTTP or HTTPS protocol meant that in some countries where individual articles on the HTTP version of the site had been blocked, the entire site was now available through HTTPS. In China, users suddenly had access to hundreds of previously blocked articles. 15 This lasted for over 18 months, until China blocked the entire HTTPS version of the site in May 2013, forcing users back to the filtered HTTP version. 16 Iran, which in 2013 was found to be blocking more than 1000 individual articles on Persian-language Wikipedia, 17 appears to have left access to the HTTPS version open. In June 2015, to increase privacy protection and uncensored site access for its users, the Wikimedia Foundation, which hosts Wikipedia, removed the option to access URLs through the HTTP protocol and transitioned fully to HTTPS across all of its sites. 18 While some users lamented the switch, arguing in favor of a "some information is better than none" approach to dealing with censorship, the move was generally perceived by the freedom of expression community as a positive step, 19 the effects of which may already be evident: in August 2015, Russia once again blacklisted Wikipedia over a single cannabis-related article, but the ban was reversed less than 24 hours later. 20 This report identifies and documents the blocking of Wikipedia content using two complementary data collection and analysis strategies: a client-side system that collected data from the perspective of users around the globe and a server-side tool that analyzed traffic coming in to Wikipedia servers. Collecting and reviewing client-side data allowed us to directly observe censorship; our server-side analysis made use of preexisting data that covered potentially every URL Wikimedia served. Combining these two approaches enabled us to leverage the advantages of each to form a more comprehensive picture of censorship of Wikipedia. Our server-side analysis tracked requests for 1.7 million articles spanning hundreds of languages from November 2011 to late April 2016, as well as 14 Ryan Lane, "Native HTTPS support enabled for all Wikimedia Foundation wikis," Wikimedia, Oct 3, 2011, https://blog.wikimedia.org/2011/10/03/native-https-support-enabled-for-all-wikimedia-foundation-wikis/. 15 "Wikipedia Drops the Ball on China—Not Too Late to Make Amends," Greatfire, Jun 3, 2013, https://en.greatfire.org/blog/2013/jun/wikipedia-drops-ball-china-not-too-late-make-amends. 16 Thomas Fox-Brewster, "Wikipedia Disturbed Over Fresh China Censorship," Forbes, May 22, 2015, http://www.forbes.com/sites/thomasbrewster/2015/05/22/wikipedia-disturbed-over-fresh-china- censorship/#295de6885f84. 17 Nima Nazeri and Collin Anderson, "Citation Filtered: Iran’s Censorship of Wikipedia," Center for Global Communication Studies, Nov 2013, http://www.global.asc.upenn.edu/fileLibrary/PDFs/CItation_Filtered_Wikipedia_Report_11_5_2013-2.pdf. 18 Yana Welinder, Victoria Baranetsky, and Brandon Black, "Securing access to Wikimedia sites with HTTPS," Wikimedia, Jun 12, 2015, https://blog.wikimedia.org/2015/06/12/securing-wikimedia-sites-with-https/. 19 Parker Higgins, "Russia's Wikipedia Ban Buckles Under HTTPS Encryption," Electronic Frontier Foundation, Aug 28, 2015, https://www.eff.org/deeplinks/2015/08/russias-wikipedia-ban-buckles-under-https-encryption. 20 Shaun Walker, "Russia briefly bans Wikipedia over page relating to drug use," The Guardian, Aug 25, 2015, https://www.theguardian.com/world/2015/aug/25/russia-bans-wikipedia-drug-charas-https. https://blog.wikimedia.org/2011/10/03/native-https-support-enabled-for-all-wikimedia-foundation-wikis/ https://en.greatfire.org/blog/2013/jun/wikipedia-drops-ball-china-not-too-late-make-amends http://www.forbes.com/sites/thomasbrewster/2015/05/22/wikipedia-disturbed-over-fresh-china-censorship/#295de6885f84 http://www.forbes.com/sites/thomasbrewster/2015/05/22/wikipedia-disturbed-over-fresh-china-censorship/#295de6885f84 http://www.global.asc.upenn.edu/fileLibrary/PDFs/CItation_Filtered_Wikipedia_Report_11_5_2013-2.pdf https://blog.wikimedia.org/2015/06/12/securing-wikimedia-sites-with-https/ https://www.eff.org/deeplinks/2015/08/russias-wikipedia-ban-buckles-under-https-encryption https://www.theguardian.com/world/2015/aug/25/russia-bans-wikipedia-drug-charas-https 4 Analyzing Accessibility of Wikipedia Projects Around the World INTERNET MONITOR the general number of requests for each Wikipedia language project from May 2015 through June 2016. Our client-side analysis, which took place primarily in June 2016, covered all of Wikimedia's 292 language projects. Both client- and server-side methods detected events that we consider likely related to censorship, in addition to a large number of suspicious events that remain unexplained. The blocking of Chinese Wikipedia in China starting in May 2015 was identified in the server-side article data, the server-side project data, and the client-side data. We identified a number of articles that appeared to be censored on Persian Wikipedia prior to the transition to solely HTTPS. Our client-side analysis witnessed transitory but intentional blocking of Yiddish Wikipedia in Thailand, as well as an unconfirmed but highly suspicious inability to access Uzbek Wikipedia from Uzbekistan. This latter event correlated with a highly anomalous decrease in traffic from Uzbekistan to Uzbek Wikipedia apparent in the server-side data. Article analysis uncovered a suspicious decrease in historical traffic to Vietnamese articles related to sex and sexuality. Analysis of project-level data uncovered a number of significant decreases in traffic from various countries that correlated with in-country events. These events ranged from natural disasters to political upheaval and affected access not only to Wikipedia but access to the Internet more broadly. Methods While this study has the simply stated goal of analyzing the accessibility of Wikipedia around the world, the methods required are more complex. We broke down the problem into three separate questions: where is Wikipedia blocked, how is Wikipedia blocked, and why is Wikipedia blocked. To assess where Wikipedia is currently blocked, we used two methods. One looked at the levels of traffic to Wikimedia's servers, and one made requests for the various Wikipedia projects from vantage points around the world. We refer to these two methods respectively as "server-side" and "client-side" analysis throughout the report. Our server-side data analysis consisted of running an anomaly detection algorithm 21 on the daily number of requests from every country to each of Wikipedia's 292 language projects. 22 This data was available from May 2015 through June 2016, and we were given access to this data on Wikimedia’s servers under a non-disclosure agreement. When run against this data, the anomaly detection algorithm output an "anomalousness" score for each day's number of requests, where a negative score meant fewer requests than expected and a positive score meant more requests than expected. The resulting anomalies were then filtered to only the most negative anomalies. Graphs of these anomalous events were generated and then manually reviewed for patterns that might indicate 21 This algorithm consists mainly of Robust Principal Component Analysis and is described in more detail in Appendix E. 22 For a list of the projects, see Appendix A. 5 Analyzing Accessibility of Wikipedia Projects Around the World INTERNET MONITOR possible censorship events. For cases in which we were interested in specific countries, we generated graphs regardless of the automatically detected anomalies and manually reviewed these. Our client-side analysis consisted of performing repeated requests to each of Wikipedia's projects from 41 network vantage points located in 40 countries. These countries were chosen because they made up the entirety of our testing network as of June 2016. 23 From each of our test locations, we requested domains of the pattern "http://(project code).wikipedia.org/wiki/," where "(project code)" is the code given by Wikimedia to each of Wikipedia's various language projects (e.g., "http://en.wikipedia.org/wiki/" for English Wikipedia). It is important to note that because we did not have access at the time of testing to in-country DNS servers, all DNS resolution took place using Google's public DNS servers (8.8.8.8 and 8.8.4.4). This means we were unable to detect any manipulation of requests for Wikipedia that took place only at the DNS level. 24 For each request we performed, we collected the time it took for the request to complete, the final URL of the response after we followed all redirects, and a screenshot of the resulting page as would be seen by the user. For any request that failed on the initial attempt, we repeated the request until we either received a successful response or it was deemed the domain was likely unavailable from the vantage point. Once all the responses were collected, we reviewed the collected data for any irregularities that might indicate blocking or throttling. Originally, to answer how Wikipedia might be censored, we intended to use full packet captures of our client tests to identify the precise technological method used to interfere with requests. For example, packet captures could be used to discriminate between IP blocking, injected TCP reset packets, DNS poisoning, injected HTTP redirects, TLS certificate spoofing, or other methods. It is also sometimes possible to identify the use of specific censorship products by looking for distinctive traits they might leave in packet captures. 25 26 Unfortunately, technical limitations in the deployment of our client network prevented us from collecting these packet captures. Therefore, in witnessed cases of blocking, we could do little but speculate as to the exact technological method of censorship. Apart from (honest) statements of governments and ISPs, the best way we have to learn about why censors block what they do is to look at historical actions for clues to their motivations. To that end, we used two methods to build context around censorship events that might help us understand motivations. First, we performed traditional research to identify and summarize key themes in the history of censorship in several countries around the world. Second, we attempted to use traffic data to specific Wikipedia articles to locate historical instances of potential censorship with the hope that these historical instances would surface themes and help bolster existing research. 23 The full list of the countries in which our test nodes were located is provided in Appendix B. 24 Further detail of our client-side collection and its potential drawbacks is provided in Appendix B. 25 "Behind Blue Coat: Investigations of commercial filtering in Syria and Burma," Nov 9, 2011, Citizen Lab, https://citizenlab.org/2011/11/behind-blue-coat/. 26 Clayton, et al., "Ignoring the Great Firewall of China," Jun 2006, https://www.cl.cam.ac.uk/~rnc1/ignoring.pdf. https://citizenlab.org/2011/11/behind-blue-coat/ https://www.cl.cam.ac.uk/~rnc1/ignoring.pdf 6 Analyzing Accessibility of Wikipedia Projects Around the World INTERNET MONITOR Our method of detecting potential censorship of articles using traffic data was fairly intuitive. We started with the hypothesis that if an article has an amount of traffic such that the number of requests per some chosen period of time is rarely zero, and then that article is censored for a sizable portion of its audience, traffic to that article will likely decrease a detectable amount. For example, if an article typically sees around 100 requests per day, and it suddenly drops to 10 requests per day for a week, we can assume something has changed. That change event would then be investigated to identify potential causes. To search for such events, we built an anomaly detection pipeline that could automatically detect significant deviations from the normal pattern of requests. 27 We then detected anomalies in the daily request histories from December 2011 through late April 2016 for approximately 1.7 million articles. Our method of selecting this set of articles was designed to favor articles that we considered more likely to be censored. The final set of 1.7 million articles covered 286 distinct Wikipedia language projects (out of the total 292), 132 of which were represented by more than 10,000 articles. All of the detected anomalies were collected in a database that allowed for easy searching. It is important to note that daily requests to articles were not broken out by geography. Instead, each data point represented the number of requests in a day for a given article from everywhere on Earth. This meant that we could not definitively attribute any given article anomaly to requests from a particular country. Instead, we could only assume the anomaly was most likely related to the country that constituted the largest share of requests to the article's language project. For example, if we located an anomaly in the request history of an article on Persian Wikipedia, the fact that 83.5% of requests for Persian Wikipedia come from Iran gave us some confidence that the anomaly could be related to Iran. On the other hand, if we located an anomaly in an article on English Wikipedia, we felt that we could not claim the anomaly was related to any single country, as nine countries each contribute more than one percent of the total requests for English Wikipedia. 28 While request data broken out by both article and geography existed, only a small amount of this data was relevant to our analysis, and we therefore opted to use a different data source. 29 This was an unfortunate loss of some of the power of interpretability that we had hoped to achieve with our methodology. Once we had run all of the article histories through our pipeline, we set about manually reviewing and investigating the anomalies that represented the most severe and longest lasting decreases in request traffic. This manual review phase was both necessary and slow. We found it necessary because large decreases in traffic can be caused by many different processes (national holidays, network outages, articles moving or being redirected, bot activity, etc.), so determining whether or not an anomaly is likely a censorship event is an evidence building process. Unfortunately, the large volume of detected anomalies and the fact that our data analysis process included a good deal of 27 This is the same as the algorithm used for Wikipedia project-level analysis. For more information about the process and the algorithm (Robust Principal Component Analysis), see Appendix E. 28 "Wikimedia Traffic Analysis Report - Page Views Per Wikipedia Language - Breakdown," Wikimedia, May 2016, https://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerLanguageBreakdown.htm 29 For a fuller description of this decision, see Appendix E. https://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerLanguageBreakdown.htm 7 Analyzing Accessibility of Wikipedia Projects Around the World INTERNET MONITOR manual review meant we were not able to investigate all the significant anomalies individually. A full accounting of our article-level analysis methodology and the issues we encountered while implementing it are provided in Appendix E. We use three kinds of graphs throughout this report. In the simplest case, we show the number of daily requests for a single article over time. In these graphs, there are vertical colored bars to indicate detected anomalies. Blue vertical bars indicate fewer requests than expected while red bars indicate more requests than expected. The depth of the hue roughly indicates the anomalousness of each anomaly relative to the other anomalies for the same article. Anomaly color bars are also included in project-level graphs where we felt they accurately highlighted important points and are excluded where they hindered interpretation. These project-level graphs do not contain numbers on their vertical axis because the data backing these graphs is only publicly available at a less granular level. Numbers are also omitted on the vertical axis of graphs that depict multiple articles at once, but for a different reason. To account for the varying levels of traffic between articles, the vertical axis depicts the percent change in traffic since the start of the graph period. This effectively normalizes the number of requests across articles, and the axis is indicated as such. Anomaly color bars are omitted on multi-article graphs as they tended to hinder interpretability. Findings By Country Country boundaries are not mirrored in the network topology of the Internet with much fidelity, but the censorship decisions with the broadest impact are often made at the national level, so we believe the state is a useful level of assessment. Below, we have highlighted a number of countries. These countries were chosen because they have either reportedly blocked Wikipedia content at some point in the past or because we have evidence of past or present broader Internet censorship within the country. For each country, we provide a short summary of the history and current state of local Internet filtering. Following that, we include the country-specific results of our data analysis that were the most noteworthy. These results may include tests from client locations, analysis of project- level data, or analysis of article-level data. China China’s Internet filtering apparatus is one of the most pervasive and complex in the world. Freedom House has expressed strong concerns about China’s Internet freedoms noting that the country uses a wide variety of techniques—IP blocking, throttling, man-in-the-middle attacks, deep packet inspection, DNS poisoning, keyword filtering, content removal, SMS and instant message filtering, the blocking of VPNs, and full Internet shutdowns in some areas—to block political and sexually explicit content, globally popular social media and publishing platforms, and Google and many of its 8 Analyzing Accessibility of Wikipedia Projects Around the World INTERNET MONITOR services. 30 OpenNet Initiative research conducted from 2004 through 2012 documented extensive, ongoing filtering of political subjects (including the Tiananmen Square protests in 1989; Taiwanese independence; the Uyghur, Tibetan, and Mongolian separatist movements; and criticism of the ruling party); religious subjects (including Falun Gong and the Dalai Lama); international media; human rights groups; pornography; online gambling; social media platforms; and circumvention tools. 31 32 More recent research has suggested that criticism of the ruling party is largely tolerated while content that has the potential to spur real-world collective action is of primary concern to censors. 33 Chinese censors have a long and contentious history with Wikipedia. The first Chinese-language Wikipedia project, chinese.wikipedia.org, was launched in May 2001; the first Chinese-language article was published in October 2002, the same month the project moved to zh.wikipedia.org. 34 35 The project faced its first challenge from censors in June 2004 when it was temporarily blocked during the anniversary of the Tiananmen Square protests. 36 The entire project has been blocked on and off since; article-level filtering of sensitive content was reportedly instituted around 2006. 37 The introduction of an HTTPS version in 2011 temporarily gave users in China full access to the project, including articles blocked on the HTTP site. 38 Sophisticated data analysis techniques were not required to identify if and when China has blocked access to Wikipedia. Time series graphs of the number of requests from China to the various Wikipedia projects make it clear when blocking occurred. Below is a graph of the daily number of requests to zh.wikipedia.org from China. A major censorship event is immediately apparent around May 19, 2015: 30 "China: Freedom on the Net 2015," Freedom House, Oct 2015, https://freedomhouse.org/report/freedom- net/2015/china. 31 "Internet Filtering in China in 2004-2005: A Country Study," OpenNet Initiative, 2005, https://opennet.net/studies/china. 32 "China," OpenNet Initiative, Aug 9, 2012, https://opennet.net/research/profiles/china. 33 Gary King, Jennifer Pan, and Margaret E. Roberts. 2014. “Reverse-engineering censorship in China: Randomized experimentation and participant observation.” Science, 6199, 345: 1-10, http://gking.harvard.edu/files/gking/files/experiment_0.pdf. 34 "Chinese Wikipedia," Wikipedia, https://en.wikipedia.org/wiki/Chinese_Wikipedia. 35 Wikipedia offers several Chinese-language projects in addition to zh.wikipedia.org, which is written in Mandarin and automatically translated, based on user preference, into traditional or simplified characters and to incorporate nationally variant vocabulary: Cantonese (https://zh-yue.wikipedia.org), Classical Chinese (https://zh-classical.wikipedia.org), and Min Nan (https://zh-min-nan.wikipedia.org). 36 Philip P. Pan, "Reference Tool On Web Finds Fans, Censors," The Washington Post, Feb 20, 2006, http://www.washingtonpost.com/wp-dyn/content/article/2006/02/19/AR2006021901335.html. 37 "Wikipedia unblocked in China after year-long ban," oneindia, Nov 16, 2006, http://www.oneindia.com/2006/11/16/wikipedia-unblocked-in-china-after-year-long-ban-1163687797.html. 38 "Wikipedia Drops the Ball on China—Not Too Late to Make Amends," Greatfire, Jun 3, 2013, https://en.greatfire.org/blog/2013/jun/wikipedia-drops-ball-china-not-too-late-make-amends. https://freedomhouse.org/report/freedom-net/2015/china https://freedomhouse.org/report/freedom-net/2015/china https://opennet.net/studies/china https://opennet.net/research/profiles/china http://gking.harvard.edu/files/gking/files/experiment_0.pdf https://en.wikipedia.org/wiki/Chinese_Wikipedia https://zh-yue.wikipedia.org/ https://zh-classical.wikipedia.org/ https://zh-min-nan.wikipedia.org/ http://www.washingtonpost.com/wp-dyn/content/article/2006/02/19/AR2006021901335.html http://www.oneindia.com/2006/11/16/wikipedia-unblocked-in-china-after-year-long-ban-1163687797.html https://en.greatfire.org/blog/2013/jun/wikipedia-drops-ball-china-not-too-late-make-amends 9 Analyzing Accessibility of Wikipedia Projects Around the World INTERNET MONITOR News reports around the time of this event corroborate that it was indeed caused by intentional government censorship. 39 We analyzed similar graphs for Wikipedia's 291 other language projects and saw no indications of similar anomalies. While data analysis is not necessary to detect obvious and documented censorship events, our analysis of article-level censorship also picked up this anomaly. As would be expected from this type of censorship, thousands of articles hosted on zh.wikipedia.org saw strong downward anomalies at the same time: It is important to reiterate that this graph depicts the number of requests to these articles from all geographic locations, not just those requests originating in China. These anomalies are detectable only because a large portion of the worldwide traffic to these Chinese language articles originated in China. Wikipedia's transition to HTTPS-only delivery occurred in June 2015–almost four weeks after China blocked access to all of zh.wikipedia.org. For that reason, we were unable to analyze the results of the transition to HTTPS-only on the number of requests for Chinese articles. Using our client network, we were able to confirm that this censorship was ongoing as of late June 2016. We were unable to access zh.wikipedia.org from either of two testing locations in mainland China. While technical limitations in the current deployment of our client network prevent us from 39 Thomas Fox-Brewster, "Wikipedia Disturbed Over Fresh China Censorship," Forbes, May 22, 2015, http://www.forbes.com/sites/thomasbrewster/2015/05/22/wikipedia-disturbed-over-fresh-china- censorship/#295de6885f84. http://www.forbes.com/sites/thomasbrewster/2015/05/22/wikipedia-disturbed-over-fresh-china-censorship/#295de6885f84 http://www.forbes.com/sites/thomasbrewster/2015/05/22/wikipedia-disturbed-over-fresh-china-censorship/#295de6885f84 10 Analyzing Accessibility of Wikipedia Projects Around the World INTERNET MONITOR pinpointing the exact method of censorship, the technological methods of censorship employed by China are extensively documented elsewhere. 40 We were also able to confirm the result that the zh.wikipedia.org domain was the only Wikipedia project affected by this censorship. Our client machines in both locations were able to successfully and reliably access the other 291 Wikipedia subdomains. 41 In order to check for throughput limitations that may or may not have been intentional ("throttling"), we timed how long it took for a complete response to reach our test clients after sending each request. We refer to this time period as the "round-trip time" ("RTT") throughout. We calculated the mean, median, and max round-trip times to each of the projects from both of our test locations. The results for our tests from China are summarized below: Median RTT Mean RTT Max RTT Location 1 550 ms 728 ms 15236 ms to ca.wikipedia.org Location 2 492 ms 531.8 ms 1492 ms to tw.wikipedia.org The maximum round-trip time for ca.wikipedia.org is exceptionally long, but subsequent tests were not significantly different from the median. Our article-level analysis of zh.wikipedia.org found almost 5,500 significant downward anomalies in the number of requests across more than 4,200 articles. A large fraction of these occur around the May 19, 2015 blocking event. While we did not uncover any other article-level events on zh.wikipedia.org that we consider likely censorship, we did encounter events that are both highly anomalous and currently unexplained. For example, articles covering a number of accented letters (Ṗ, Ṁ, Ṫ, Ŗ, Ẋ) saw steep declines in requests beginning September 16, 2014, and all recovered at the same time in mid-October, 2014: 40 Young Xu, "Deconstructing the Great Firewall of China," Thousand Eyes, Mar 8, 2016, https://blog.thousandeyes.com/deconstructing-great-firewall-china/. 41 One project, Wikipedia in the Nuosu language (ii.wikipedia.org), returned content that was consistent with other Wikipedia projects, but returned a 404 HTTP status code in all our client tests from all locations. https://blog.thousandeyes.com/deconstructing-great-firewall-china/ 11 Analyzing Accessibility of Wikipedia Projects Around the World INTERNET MONITOR Because these articles contain little content, the number of requests recover overnight, and the article histories show nothing that might explain these changes (such as deleting or renaming the articles), we suspect this behavior might be indicative of either external links to the pages changing or a bot or some other form of programmatic request temporarily suspending activity. Additionally, our analysis highlighted many anomalous events beginning around August 14, 2013 as well as around August 7, 2015. Articles that were part of these events did not appear thematically related, but traffic drops were significant, and the events were limited to articles in the zh.wikipedia.org domain. While our research did not turn up anything for these dates, we document them here with the hope that they might hold some significance for those more familiar with either Wikipedia's infrastructure or Chinese manipulation of Internet traffic. While article-level analysis contributed little to the historical context surrounding Chinese Internet censorship, as outlined above, this type of analysis is widely available, and it did serve to bolster our findings from our other methods. Our client tests showed that one Wikipedia domain, zh.wikipedia.org, was completely inaccessible in China, while all other projects were available. Wikimedia's own data on traffic to its projects showed obvious indications of the censorship events reported in the media. While Internet censorship in China is widespread, as of June 2016, Chinese censorship of Wikipedia appears limited to the zh.wikipedia.org domain. Cuba The past three years have seen considerable growth in Cuba’s Internet infrastructure, but access is still limited and tightly controlled. The country has two ISPs, both of which are state-owned, and Cuba uses the Avila Link monitoring software to track Internet users and obtain usernames and passwords. 42 Most Cubans are only permitted access to the intranet, which includes a small selection of government-approved websites and services; access to the global public Internet is largely limited to a handful of public WiFi access points and expensive government-run Internet cafes. The Revolutionary Orientation Department (DOR) oversees filtering in the country. 43 Political content, 42 "Cuba: Freedom on the Net 2015," Freedom House, October 2015. https://freedomhouse.org/report/freedom- net/2015/cuba. 43"Cuba: Long live freedom (but not for the Internet)!," Reporters Without Borders, Mar 12, 2014, http://12mars.rsf.org/2014-en/2014/03/11/cuba-long-live-freedom-but-not-for-the-internet/. https://freedomhouse.org/report/freedom-net/2015/cuba https://freedomhouse.org/report/freedom-net/2015/cuba http://12mars.rsf.org/2014-en/2014/03/11/cuba-long-live-freedom-but-not-for-the-internet/ 12 Analyzing Accessibility of Wikipedia Projects Around the World INTERNET MONITOR including dissident blogs and news sites, is heavily filtered; common social media platforms such as Facebook and Twitter, VoIP services, and web services such as Yahoo and Hotmail are intermittently blocked. 44 We did not have a client testing node available in Cuba. Almost 100% of the requests coming out of Cuba are for either Spanish or English Wikipedia. 45 Visible in the graphs above is a steep decrease in traffic around the June 12, 2015 HTTPS-only transition. Apart from that, traffic from May 2015 to July 2016 does not show signs that might indicate widespread censorship. Our anomaly detection algorithm did not detect any significant anomalies in the request histories of any other Wikipedia project. While access to the public Internet is restricted, for those with access, we were unable to find any firm evidence that Cuba was censoring any Wikipedia project. Egypt Despite offering comparatively free and open access to a wide spectrum of online content, Egypt’s Internet environment is still tightly controlled. Political, social, and religious websites are broadly available, but arrests, attacks, self-censorship, and full Internet shutdowns contribute to an atmosphere of repression. Many activists are worried about the draft of a new cybercrime bill introduced in 2015 that would allow the government to heavily increase its censorship role in the name of national security. 46 While this has not yet been enacted, other laws require owners of 44 Ellery Biddle, "Rationing the Digital: The Policy and Politics of Internet Use in Cuba Today," Internet Monitor, Jul 10, 2013, http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2291721. 45 "Wikimedia Traffic Analysis Report - Wikipedia Page Views Per Country - Breakdown," May 2016, https://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerCountryBreakdown.htm#Cuba 46 Ragab Saad, "Egypt’s Draft Cybercrime Law Undermines Freedom of Expression," Atlantic Council, Apr 24, 2015, http://www.atlanticcouncil.org/blogs/menasource/egypt-s-draft-cybercrime-law-undermines-freedom-of-expression. http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2291721 https://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerCountryBreakdown.htm#Cuba http://www.atlanticcouncil.org/blogs/menasource/egypt-s-draft-cybercrime-law-undermines-freedom-of-expression 13 Analyzing Accessibility of Wikipedia Projects Around the World INTERNET MONITOR Internet cafes to track the identities and activities of customers online. VoIP services and encryption tools are also restricted according to Egyptian Telecommunications Laws, but these laws are not widely enforced. Though it rarely filters online content, the Egyptian government is known for arresting bloggers and journalists critical of the country’s current leadership or of Islam. Access to the Internet was most limited during the early 2011 Egyptian revolution protests: for two days both Twitter and Facebook were blocked, and for four days after that, the Internet was down throughout the country. The state’s control over the country’s telecommunications infrastructure, which is primarily owned by government-run Egypt Telecom, enables the government to slow or completely cut off Internet traffic, mobile messaging, and SMS. 47 Due to the wide geographic spread of Arabic, historical article anomalies for Arabic Wikipedia are difficult to attribute to Egypt. Noteworthy results from Arabic Wikipedia are presented in Additional Findings, below. Our client testing node in Egypt was able to successfully and reliably access all Wikipedia projects. Network timing of the responses from each of the projects showed no signs of throttling: Median RTT Mean RTT Max RTT 207 ms 259.1 ms 1792 ms to ceb.wikipedia.org Analysis of traffic to Arabic and English Wikipedias showed no major anomalies other than during the holidays around the beginning and end of Ramadan: 47 "Freedom of the Net 2015: Egypt," Freedom House, May 2015, https://freedomhouse.org/report/freedom- net/2015/egypt. https://freedomhouse.org/report/freedom-net/2015/egypt https://freedomhouse.org/report/freedom-net/2015/egypt 14 Analyzing Accessibility of Wikipedia Projects Around the World INTERNET MONITOR While holidays are rarely relevant when discussing the the availability of websites, it is important to note their effect on web traffic, as they can often look similar both statistically and graphically to other types of outage events. A large part of our manual review process was dedicated to successfully ignoring holiday effects. As of June 2016, we had no evidence that Egypt censored any part of Wikipedia. Indonesia Internet censorship in Indonesia is managed by the Ministry of Communication and Information (MCI), which has broad powers to block "negative" content, mostly granted through the Information and Electronic Transactions Law (ITE). 48 MCI maintains a system called Trust Positive 49 which acts as a database cataloguing content that should be censored, but the actual implementation of censorship is left up to the ISPs. As of June 2016, Trust Positive contained approximately 770,000 URLs, about 99.5% of which were categorized as pornographic. Due to the decentralized nature of the censorship infrastructure, some ISPs filter additional URLs while others do not enforce all of the government mandated blocks. 50 For this reason, it is hard to attribute each censored website to the government. Though most of the content blocked by Indonesian law is pornographic, the relevant statutes are ambiguous, so content related to radicalism, violence, hate speech, fraud, gambling, child violence and pornography, internet security, and intellectual property rights also sees censorship. 51 The pornographic category itself is also very broadly defined. In 2010, the OpenNet Initiative documented evidence of substantial blocking of pornography across different ISPs, but this block also included sites related to women’s rights and LGBT websites.52 Occasionally, LGBT content is specifically targeted by censors, despite being legal in the country. 53 Censors in Indonesia have also appeared willing to censor entire platforms for relatively small amounts of content, at various times blocking all of Netflix, Tumblr, Reddit, and Vimeo, mostly for nudity or sexually explicit content. 54 55 56 48 "Freedom on the Net 2015: Indonesia," Freedom House, Oct 2015, https://freedomhouse.org/report/freedom- net/2015/indonesia. 49 See http://trustpositif.kominfo.go.id/. 50 "Freedom on the Net 2015: Indonesia," Freedom House, Oct 2015, https://freedomhouse.org/report/freedom- net/2015/indonesia. 51 Ibid. 52 "Indonesia," OpenNet Initiative, Aug 9, 2015. https://opennet.net/research/profiles/indonesia. 53 "Indonesia bans gay emoji and stickers from messaging apps ," The Guardian, Feb 11, 2016, https://www.theguardian.com/world/2016/feb/12/indonesia-bans-gay-emoji-and-stickers-from-messaging-apps. 54 Leo Kelion, "Netflix blocked by Indonesia in censorship row," BBC, Jan 28, 2016, http://www.bbc.com/news/technology-35429036. 55 "Indonesia to ban 477 websites over adult-rated contents," Xinhua News, Feb 17, 2016, http://news.xinhuanet.com/english/2016-02/17/c_135106798.htm. 56 Enricko Lukman, "Amid online porn crackdown, Vimeo, Reddit and Imgur are blocked in Indonesia," TechInAsia, May 14, 2014, https://www.techinasia.com/online-porn-crackdown-vimeo-reddit-imgur-blocked-indonesia. https://freedomhouse.org/report/freedom-net/2015/indonesia https://freedomhouse.org/report/freedom-net/2015/indonesia http://trustpositif.kominfo.go.id/ https://freedomhouse.org/report/freedom-net/2015/indonesia https://freedomhouse.org/report/freedom-net/2015/indonesia https://opennet.net/research/profiles/indonesia https://www.theguardian.com/world/2016/feb/12/indonesia-bans-gay-emoji-and-stickers-from-messaging-apps http://www.bbc.com/news/technology-35429036 http://news.xinhuanet.com/english/2016-02/17/c_135106798.htm https://www.techinasia.com/online-porn-crackdown-vimeo-reddit-imgur-blocked-indonesia 15 Analyzing Accessibility of Wikipedia Projects Around the World INTERNET MONITOR Article-level analysis of Indonesian Wikipedia (id.wikipedia.org) uncovered a large number of anomalies, only some of which could be explained by innocuous causes. While we are not confident enough to claim any of the anomalies we detected were indicative of censorship, we feel a number of anomalies are worth highlighting as suspicious: Start Date Indonesian Article English Article 2012-12-17 Facebook Facebook 2014-09-25 Lolicon Lolicon 2014-10-04 Hentai Hentai 2014-10-17 Film_porno Pornographic film 2015-05-03 Hubungan_sedarah Incest We consider these articles particularly suspicious because most of them are sexual in nature, which, as noted above, is a sensitive topic in Indonesia. None appear to be related to changes to the articles themselves that might otherwise explain significant traffic decreases (such as article deletion or renaming). We do note though that none of the articles show substantial and sustained increases in traffic after the HTTPS-only transition of mid-June, 2015, which we might expect for articles that were censored. 16 Analyzing Accessibility of Wikipedia Projects Around the World INTERNET MONITOR Client side tests from Indonesia returned nothing indicating domain or subdomain blocking. Round- trip times were somewhat slow, but still within normal boundaries. One project, mus.wikipedia.org, took more than four seconds to return, but subsequent requests returned in regular time. Median RTT Mean RTT Max RTT 599 ms 565.4 ms 4588 ms to mus.wikipedia.org The server-side data on Indonesia did include one significant anomaly that did not appear related to a public holiday. Traffic to Indonesian and English Wikipedias was significantly lower than normal on July 16, 2015. Further research suggests this might have been related to the eruption of two volcanoes, which caused other disturbances throughout the country. 57 57 "Indonesia closes three airports as two volcanoes erupt," Deutsche Welle, Jul 16, 2015, http://www.dw.com/en/indonesia-closes-three-airports-as-two-volcanoes-erupt/a-18589931. http://www.dw.com/en/indonesia-closes-three-airports-as-two-volcanoes-erupt/a-18589931 17 Analyzing Accessibility of Wikipedia Projects Around the World INTERNET MONITOR While it does appear possible that network operators in Indonesia instituted some level of article censorship in the past, our server-side and client-side data analysis did not locate any evidence that censorship of any of Wikipedia’s projects was taking place as of June 2016. Iran Internet filtering in Iran is implemented by the Commission to Determine the Instances of Criminal Content (CDICC) and broadly overseen by the Supreme Council of Cyberspace; both groups are primarily composed of members appointed by Supreme Leader Ayatollah Ali Khamenei. 58 Content related to the political opposition, human rights (particularly women’s rights), minorities, religion, and sex is heavily filtered, as are independent and international media, many major social media platforms, and circumvention tools. 59 60 President Hassan Rouhani, elected in 2013, promised during his campaign to "ensure that the people of Iran will comfortably be able to access all information globally" and stated that "all human beings have a right" to use social networks. 61 Despite those statements, Facebook, Twitter, and a number of other platforms remain blocked, though Rouhani’s administration did resist a CDICC order to block WhatsApp in 2014. 62 In 2006, then-president Mahmoud Ahmadinejad announced plans to build a national Internet system, in part to improve the country’s digital infrastructure and increase speeds, which are currently among the lowest in the world. 63 The project is considerably behind schedule, but is moving forward. One of the project's stated goals is to move the entire country onto a national network, largely disconnected from the greater World Wide Web, to help ensure that Iranian Internet users are accessing "clean" content on domestic Internet hosts. 64 Iran’s current filtering technology is already quite centralized: traffic in and out of the country is routed through the 58 "Iranian Internet Infrastructure and Policy Report," Small Media, Apr 2014, https://smallmedia.org.uk/sites/default/files/u8/IIIP_April2014.pdf. 59 "Iran: Freedom on the Net 2015," Freedom House, Oct 2015, https://freedomhouse.org/report/freedom- net/2015/iran. 60 "Iran," OpenNet Initiative, Jun 16, 2009, https://opennet.net/research/profiles/iran. 61 Saeed Kamali Dehghan, "Hassan Rouhani suggests online freedom for Iran in Jack Dorsey tweet," The Guardian, Oct 2, 2013, https://www.theguardian.com/world/iran-blog/2013/oct/02/iran-president-hassan-rouhani-internet-online- censorship. 62 "Leyla Khodabakhshi," "Rouhani move over WhatsApp ban reveals Iran power struggle," BBC, May 8, 2014, http://www.bbc.com/news/world-middle-east-27330745. 63 "State of the Internet: Q1 2016 Report," Akamai, Jun 2016, https://www.akamai.com/us/en/multimedia/documents/state-of-the-internet/akamai-state-of-the-internet-report-q1- 2016.pdf. 64 "Tightening the Net: Internet Security and Censorship in Iran: Part 1: The National Internet Project," Article19, Mar 2016, https://www.article19.org/data/files/medialibrary/38315/The-National-Internet-AR-KA-final.pdf. https://smallmedia.org.uk/sites/default/files/u8/IIIP_April2014.pdf https://freedomhouse.org/report/freedom-net/2015/iran https://freedomhouse.org/report/freedom-net/2015/iran https://opennet.net/research/profiles/iran https://www.theguardian.com/world/iran-blog/2013/oct/02/iran-president-hassan-rouhani-internet-online-censorship https://www.theguardian.com/world/iran-blog/2013/oct/02/iran-president-hassan-rouhani-internet-online-censorship http://www.bbc.com/news/world-middle-east-27330745 https://www.akamai.com/us/en/multimedia/documents/state-of-the-internet/akamai-state-of-the-internet-report-q1-2016.pdf https://www.akamai.com/us/en/multimedia/documents/state-of-the-internet/akamai-state-of-the-internet-report-q1-2016.pdf https://www.article19.org/data/files/medialibrary/38315/The-National-Internet-AR-KA-final.pdf 18 Analyzing Accessibility of Wikipedia Projects Around the World INTERNET MONITOR previously state-owned Telecommunications Infrastructure Company, providing the government with the means to monitor online activities, limit access, throttle speeds, and redirect users attempting to access blocked sites. Authorities also employ keyword filtering, SSL man-in-the- middle attacks, and potentially deep packet inspection to manipulate traffic. 65 Iran has intermittently blocked access to the HTTPS version of Wikipedia since it was introduced in 2011; the English and Kurdish versions of the site have also seen temporary blocks. 66 In 2013, researchers used proxy servers in Iran to scan every Persian-language Wikipedia URL— approximately 1.7 million in total—and identified nearly 1,000 blocked articles. Just over 400 of these contained political content; the others involved sex, religion, human rights, arts and culture, media and journalists, academia, profanity, drugs, and alcohol. Over half of the blocked articles were biographies of individuals; approximately half of those were biographies of people the government had arrested, detained, or killed. The study concludes that Wikipedia filtering in Iran is in part keyword-based, triggered when users request URLs that match a blacklist of terms; approximately 200 of the articles were filtered on this basis, while the rest were individually blocked. 67 Given this, the transition to HTTPS-only delivery of content in 2015 should have substantially affected the Iranian government's ability to censor Wikipedia articles. Our article-level analysis indicates that this was indeed the case. We note again that article request histories are not broken out by country; however, Wikimedia's data shows that a large share of the traffic to Persian Wikipedia (fa.wikipedia.org) originates in Iran. 68 Borrowing methodology from another Wikimedia research project, 69 we searched our database of anomalies for articles that saw significantly higher levels of traffic starting around June 12, 2015 (the HTTPS-only transition). We then manually reviewed the resulting articles. This step revealed that many of the articles our algorithm detected saw increased traffic because they were moved or renamed at around the same time as the transition. After removing those articles from our results, we were left with 22 articles that saw increased traffic after the transition that could not be explained by other means. The set of articles Iran was censoring at the time of the transition was certainly larger than this (as evidenced by the study referenced above), but we do not claim comprehensiveness. We did find that many of the articles identified by our process belonged to the same categories that were most likely to see censorship in the previous research. The set of articles we identified consisted mostly of 65 Simurgh Aryan, Homa Aryan, and J. Alex Halderman, "Internet Censorship in Iran: A First Look," Proceedings of the 3rd USENIX Workshop on Free and Open Communications on the Internet, Aug 2013, https://jhalderm.com/pub/papers/iran- foci13.pdf. 66 "New York Times website unblocked, YouTube still inaccessible," Reporters Without Borders, Dec 7, 2006, http://archives.rsf.org/print.php3?id_article=20016. 67 Nima Nazeri and Collin Anderson, "Citation Filtered: Iran’s Censorship of Wikipedia," Center for Global Communication Studies, Annenberg School for Communication (University of Pennsylvania), Nov 2013, http://www.global.asc.upenn.edu/fileLibrary/PDFs/CItation_Filtered_Wikipedia_Report_11_5_2013-2.pdf. 68 "Wikimedia Traffic Analysis Report - Page Views Per Wikipedia Language - Breakdown," May 2016, https://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerLanguageBreakdown.htm#Persian. 69 "HTTPS Transition and Article Censorship," Wikimedia, https://meta.wikimedia.org/wiki/Research:HTTPS_Transition_and_Article_Censorship. https://jhalderm.com/pub/papers/iran-foci13.pdf https://jhalderm.com/pub/papers/iran-foci13.pdf http://archives.rsf.org/print.php3?id_article=20016 http://www.global.asc.upenn.edu/fileLibrary/PDFs/CItation_Filtered_Wikipedia_Report_11_5_2013-2.pdf https://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerLanguageBreakdown.htm#Persian https://meta.wikimedia.org/wiki/Research:HTTPS_Transition_and_Article_Censorship 19 Analyzing Accessibility of Wikipedia Projects Around the World INTERNET MONITOR articles related to sex (fifteen articles, e.g., the Persian equivalents of "Sex" and "Cunnilingus"), but also contained political reformers (e.g., the Persian translation of "Mohammad Khatami") and governmental institutions (e.g., the Persian translation of "Army of the Guardians of the Islamic Revolution"). The full list of articles and their English equivalents is included in Appendix C. Below is a graph of daily traffic to all 22 articles from December 2011 onward: The uptick in June 2015 is visible, as are two events beginning the end of December 2011 and the end of March 2012 that affect most articles in the set. It is possible Iranian network operators were testing or otherwise adjusting their censorship capabilities around this time, but we were not able to find documented evidence of this. Much of our methodology was designed around locating the beginning of censorship events rather than the end. While this did not produce many positive results, we do believe this method identified the start of a censorship event on Persian Wikipedia. The top four anomalies starting on February 26, 2015 for articles in the fa.wikipedia.org domain are: Article Translation ودکا شامپاین عرق تکیلا Vodka Champagne Sweat Tequila Three are clearly identifiable as alcohols, the consumption of which has been illegal in Iran since 1979. 70 The article that translates to "Sweat" is a disambiguation article whose first link is to the article " سگیعرق ," which translates to "Aragh Sagi." "Aragh Sagi" is a type of alcohol, and "عرق" is a translation of both "sweat" and "distillate." We believe specific censorship of a disambiguation page to be unlikely, and instead suggest that this fact supports the assertion that filtering in Iran is at least 70 Adam Taylor, "Iran is opening 150 alcoholism treatment centers, even though alcohol is banned," Washington Post, Jun 9, 2015, https://www.washingtonpost.com/news/worldviews/wp/2015/06/09/iran-is-opening-150-alcoholism- treatment-centers-even-though-alcohol-is-banned/. https://www.washingtonpost.com/news/worldviews/wp/2015/06/09/iran-is-opening-150-alcoholism-treatment-centers-even-though-alcohol-is-banned/ https://www.washingtonpost.com/news/worldviews/wp/2015/06/09/iran-is-opening-150-alcoholism-treatment-centers-even-though-alcohol-is-banned/ 20 Analyzing Accessibility of Wikipedia Projects Around the World INTERNET MONITOR partially keyword based. The graph of daily requests for these four articles around this time period is below: The drop in requests is clearly visible, and while we have not calculated the statistical significance, it appears as if each traffic to each article increases slightly beginning right after the HTTPS-only transition. It is also interesting to note that if this is a censorship event, Iranian censorship officials were actively adding to their lists of blocked content as recently as February 2015, which means these articles were likely censored for only a matter of months. We also located an event during the spring of 2013 during which more than 20 seemingly unrelated articles saw large falls in traffic (e.g., the following graph depicts traffic to the Persian equivalents of "Psychology," "Immanuel Kant," "Don Quixote," and "Cosmetics"): We consider this event unlikely to be censorship because although it happens slightly later, it is similar to many other events across many languages during the spring of 2013 in which numerous unrelated articles saw dramatically decreased traffic before returning to normal levels weeks later. This widespread event is documented in Appendix D. The number of requests from Iran to Persian and Kurdish Wikipedias—both of which have reportedly been blocked in the past—do not indicate any significant anomalies in the period from May 2015 to July 2016 apart from what is likely decreased traffic due to the holidays around Nowruz (an Iranian holiday celebrating the Iranian New Year) in late March 2016: 21 Analyzing Accessibility of Wikipedia Projects Around the World INTERNET MONITOR Traffic to English Wikipedia from Iran again shows the anomaly that is likely Nowruz plus a rather large drop in traffic around the HTTPS-only transition. This could have a number of causes, though we believe this decrease is less likely to be related to censorship, as similar decreases in traffic around the time of the transition can be seen in traffic from countries not known to have blocked any part of Wikipedia in the past (e.g., Fiji, outlined in Additional Findings below). We did not have client testing infrastructure in place in Iran. Our research on Iran uncovered evidence backing the claims of previous researchers that Iran has blocked Wikipedia articles in the past and that many of those were related to sex or Iranian politics. We further suggest that Wikipedia's transition to HTTPS disabled at least some part of this censorship. While server-side and article analysis indicated that portions of Wikipedia had been censored by Iran in the past, as of late June 2016, evidence of this censorship no longer existed, and at least some articles that had likely seen censorship were receiving increased levels of traffic since Wikipedia's transition to HTTPS. Kazakhstan The most heavily censored content in Kazakhstan is that related to religious extremism. Most blocking happens by court order, and throughout all of 2014, the Prosecutor General's Office asked courts to block 703 websites and 198 specific URLs related to the topic. The most significant recent cases of such censorship were related to domestic and international coverage of Kazakhstan’s association with ISIS. For example, in the fall of 2014, any web pages containing a series of ISIS videos portraying alleged Kazakh nationals as ISIS soldiers were blocked. 71 Though the bulk of censorship is dedicated to extremism, popular social media sites have also been targets in the past, though official reasons are rarely given, and ISPs often deny blocking the sites. The blogging platform LiveJournal has been blocked since 2008. 72 Twitter, Facebook, Instagram, 71 "Kazakhstan: Freedom on the Net 2015," Freedom House, Oct 2015, https://freedomhouse.org/report/freedom- net/2015/kazakhstan. 72 Ibid. https://freedomhouse.org/report/freedom-net/2015/kazakhstan https://freedomhouse.org/report/freedom-net/2015/kazakhstan 22 Analyzing Accessibility of Wikipedia Projects Around the World INTERNET MONITOR and VKontakte were blocked intermittently for short periods of time in 2014. 73 There have also been several cases of content removal from YouTube, such as a video of ethnic related struggle in South Kazakhstan. Some websites are blocked without any evident court decision, including two major Central Asian news sites, Ca-news (based in Kyrgyzstan) and Fergananews (based in Russia), which are inaccessible for unknown reasons. 74 Article-level analysis of Kazakh Wikipedia discovered a significant number of anomalies, though further investigation suggested all were associated with the public holidays of either Gregorian New Year or Nowruz (beginning around March 20). Analysis of server-side data revealed much the same thing: Client-side tests from Kazakhstan were highly inconsistent, with all projects seeing a large number of intermittent errors. These intermittent errors occurred on all tested domains, pointing to an error in the testing node rather than any external issues. Despite this fact, after repeated requests, we were able to successfully receive responses from all Wikipedia projects. The timing of the network requests did not indicate anything out of the ordinary: Median RTT Mean RTT Max RTT 226 ms 239.5 ms 878 ms to iu.wikipedia.org We were unable to locate any evidence that Wikipedia or any of its projects were being censored in Kazakhstan as of June 2016. Pakistan In March 2015, Pakistan’s prime minister gave authority over Internet filtering in the country to the Pakistan Telecommunication Authority, skirting existing legislation that vests this power in the Inter-Ministerial Committee for the Evaluation of Web Sites (IMCEW). 75 The change does not yet appear to have affected the country’s Internet filtering regime, which is "inconsistent and 73 "Kazakhstan blocked Facebook, Instagram, twitter and Vkontakte for several hours" [in Russian], TJournal, Nov 28, 2014, https://tjournal.ru/p/kazakhstan-total-block. 74 "Kazakhstan: Freedom on the Net 2015," Freedom House, Oct 2015, https://freedomhouse.org/report/freedom- net/2015/kazakhstan. 75 "Pakistan: Freedom on the Net 2015," Freedom House, Oct 2015, https://freedomhouse.org/report/freedom- net/2015/pakistan. https://tjournal.ru/p/kazakhstan-total-block https://freedomhouse.org/report/freedom-net/2015/kazakhstan https://freedomhouse.org/report/freedom-net/2015/kazakhstan https://freedomhouse.org/report/freedom-net/2015/pakistan https://freedomhouse.org/report/freedom-net/2015/pakistan 23 Analyzing Accessibility of Wikipedia Projects Around the World INTERNET MONITOR intermittent" 76 but generally targets topic areas that threaten national security or are religiously blasphemous. Access to international news organizations and independent media is generally open, as is access to the websites of human rights organizations, local civil society groups, and Pakistani political parties. Since 2011, all online pornography has been banned, a block that has also affected some sex education and health websites. 77 YouTube has been largely blocked since 2012, when an anti-Islamic video garnered attention throughout the Muslim world. 78 In January 2016, a localized version of YouTube was created that allows the Pakistani government to monitor and take down content deemed inappropriate. 79 In 2013, Citizen Lab researchers documented the use of Netsweeper filters to block political, social, and religious on the network of Pakistan Telecommunication Company Limited, the largest telecommunications company in the country. 80 Facebook and Twitter have received public criticism in the West for limiting access to content at the request of the Pakistani government; 81 in 2014, both platforms republished previously blocked content. Wikipedia is generally accessible, but was blocked for a few hours in 2006 and for several days in 2010. 82 83 Attributing historical article-level censorship to Pakistan is difficult. As 98% of Wikipedia requests are directed at English Wikipedia, 84 and our current data does not allow us to separate Pakistani requests to English Wikipedia from requests from other countries, we have little-to-no ability to detect Pakistani article censorship. Our client test node in Pakistan was able to access all Wikipedia projects in a timely fashion: Median RTT Mean RTT Max RTT 281 ms 345.1 ms 1027 ms to am.wikipedia.org 76 "Pakistan," OpenNet Initiative, Aug 6, 2012, https://opennet.net/research/profiles/pakistan. 77 "Pakistan: Freedom on the Net 2015," Freedom House, Oct 2015, https://freedomhouse.org/report/freedom- net/2015/pakistan. 78 Jon Boone, "Dissenting voices silenced in Pakistan's war of the web," The Guardian, Feb 18, 2015, https://www.theguardian.com/world/2015/feb/18/pakistan-war-of-the-web-youtube-facebook-twitter. 79 Tommy Wilkes, "Pakistan Lifts Ban on YouTube After Launch of Local Version," Reuters, Jan 19, 2015, http://www.reuters.com/article/us-pakistan-youtube-idUSKCN0UW1ER. 80 "O Pakistan, We Stand on Guard for Thee: An Analysis of Canada-based Netsweeper’s Role in Pakistan’s Censorship Regime," Citizen Lab, Jun 20, 2013, https://citizenlab.org/2013/06/o-pakistan/. 81 "Pakistan - Government Requests Report," Facebook, Jan 2014 - Jun 2014, https://govtrequests.facebook.com/country/Pakistan/2014-H1/. 82 "Websites blocked, PTA tells SC: Blasphemous material," Dawn, Mar 14, 2006, http://www.dawn.com/news/183047/websites-blocked-pta-tells-sc-blasphemous-material. 83 "Pakistan blocks access to YouTube in internet crackdown," BBC, May 20, 2010, http://www.bbc.com/news/10130195. 84 "Wikimedia Traffic Analysis Report - Wikipedia Page Views Per Country - Breakdown," May 2016, https://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerCountryBreakdown.htm#Pakistan. https://opennet.net/research/profiles/pakistan https://freedomhouse.org/report/freedom-net/2015/pakistan https://freedomhouse.org/report/freedom-net/2015/pakistan https://www.theguardian.com/world/2015/feb/18/pakistan-war-of-the-web-youtube-facebook-twitter http://www.reuters.com/article/us-pakistan-youtube-idUSKCN0UW1ER https://citizenlab.org/2013/06/o-pakistan/ https://govtrequests.facebook.com/country/Pakistan/2014-H1/ http://www.dawn.com/news/183047/websites-blocked-pta-tells-sc-blasphemous-material http://www.bbc.com/news/10130195 https://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerCountryBreakdown.htm#Pakistan 24 Analyzing Accessibility of Wikipedia Projects Around the World INTERNET MONITOR The only significant downward project-level anomalies to English Wikipedia from Pakistan appear to be the holidays around Ramadan and Eid al-Adha. Based on our client tests and Wikipedia data, as of June 2016, we had no firm evidence that any Wikipedia project was being blocked or limited in Pakistan. Russia Over the past few years, the Russian government has systematically moved to increase its control over the online information environment, passing new legislation that expands authorities’ power to access user data, monitor online activity, and block and take down websites. 85 OpenNet Initiative testing in 2010 found evidence of filtering only of sexually explicit content, but no evidence of political filtering. 86 In the past six years, filtering has grown dramatically and now includes opposition websites, content related to the 2014 conflict in Ukraine and other political protests and events, "extremist" content, and information about drugs and suicide. 87 The federal agency Roskomnadzor, tasked with supervising electronic media in the country, maintains a blacklist of blocked sites; several Wikipedia articles in both Russian and English, most related to drugs or suicide, have reportedly appeared on the list since 2012. 88 In July 2012, editors of Russian-language Wikipedia shut down the site for 24 hours to protest pending legislation that would increase the government’s powers to block online content.89 This event was represented in our article-level analysis as the most anomalous event we saw for Russian Wikipedia. On July 10, 2012, there were significant decreases in traffic across more than 1,000 articles that quickly disappeared the next day: 85 Andrey Tselikov, "The Tightening Web of Russian Internet Regulation," Berkman Center for Internet & Society (Harvard University), Nov 2014, http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2527603. 86 "Russia," OpenNet Initiative, Dec 19, 2010, https://opennet.net/research/profiles/russia. 87 "Russia: Freedom on the Net 2015," Freedom House, Oct 2015, https://freedomhouse.org/report/freedom- net/2015/russia. 88 "Wikipedia Pages in the Unified Register of Banned Sites" [in Russian], Wikipedia, https://ru.wikipedia.org/wiki/Википедия:Страницы_Википедии,_внесённые_в_Единый_реестр_запрещённых_са йтов. 89 "Russia's Wikipedia shuts down for 24 hours," ABC News Online, Jul 10, 2012, http://www.abc.net.au/news/2012-07- 11/russias-wikipedia-shuts-down-for-24hrs/4122664. http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2527603 https://opennet.net/research/profiles/russia https://freedomhouse.org/report/freedom-net/2015/russia https://freedomhouse.org/report/freedom-net/2015/russia https://ru.wikipedia.org/wiki/%D0%92%D0%B8%D0%BA%D0%B8%D0%BF%D0%B5%D0%B4%D0%B8%D1%8F:%D0%A1%D1%82%D1%80%D0%B0%D0%BD%D0%B8%D1%86%D1%8B_%D0%92%D0%B8%D0%BA%D0%B8%D0%BF%D0%B5%D0%B4%D0%B8%D0%B8,_%D0%B2%D0%BD%D0%B5%D1%81%D1%91%D0%BD%D0%BD%D1%8B%D0%B5_%D0%B2_%D0%95%D0%B4%D0%B8%D0%BD%D1%8B%D0%B9_%D1%80%D0%B5%D0%B5%D1%81%D1%82%D1%80_%D0%B7%D0%B0%D0%BF%D1%80%D0%B5%D1%89%D1%91%D0%BD%D0%BD%D1%8B%D1%85_%D1%81%D0%B0%D0%B9%D1%82%D0%BE%D0%B2 https://ru.wikipedia.org/wiki/%D0%92%D0%B8%D0%BA%D0%B8%D0%BF%D0%B5%D0%B4%D0%B8%D1%8F:%D0%A1%D1%82%D1%80%D0%B0%D0%BD%D0%B8%D1%86%D1%8B_%D0%92%D0%B8%D0%BA%D0%B8%D0%BF%D0%B5%D0%B4%D0%B8%D0%B8,_%D0%B2%D0%BD%D0%B5%D1%81%D1%91%D0%BD%D0%BD%D1%8B%D0%B5_%D0%B2_%D0%95%D0%B4%D0%B8%D0%BD%D1%8B%D0%B9_%D1%80%D0%B5%D0%B5%D1%81%D1%82%D1%80_%D0%B7%D0%B0%D0%BF%D1%80%D0%B5%D1%89%D1%91%D0%BD%D0%BD%D1%8B%D1%85_%D1%81%D0%B0%D0%B9%D1%82%D0%BE%D0%B2 http://www.abc.net.au/news/2012-07-11/russias-wikipedia-shuts-down-for-24hrs/4122664 http://www.abc.net.au/news/2012-07-11/russias-wikipedia-shuts-down-for-24hrs/4122664 25 Analyzing Accessibility of Wikipedia Projects Around the World INTERNET MONITOR The fact that about two-thirds of all traffic to Russian Wikipedia originates in Russia 90 supports the conclusion that this event was indeed related to the protest. In August 2015, access to ru.wikipedia.org was temporarily blocked after Russian Wikipedia did not meet Roskomnadzor’s demands to remove an article about a type of cannabis. The site’s use of HTTPS meant the internet service providers were unable to block the individual offending page and therefore would have to block all of Russian Wikipedia. 91 The block lasted for several hours before Roskomnadzor announced that the article had been sufficiently edited to meet its guidelines, though Wikipedia editors said the page remained the same. 92 The decrease in traffic that this ban likely caused was not detected by our algorithm on either the article level or the level of Russian Wikipedia as a whole. Most of the remaining large anomalous events we detected in article traffic occurred around holidays, most notably the New Year. Across all the Wikipedia projects we looked at, the holiday effect appeared strongest in Russian Wikipedia. Thousands of articles had large decreases starting near the end of 2011, 2012, 2013, 2014, and 2015. The graph of the number of views to all of ru.wikipedia.org from Russia shows one of these strong New Year anomalies: The remaining anomalies detected by our article-level analysis are unexplained. The most significant of these unexplained events consisted of 261 articles dropping off around February 18, 2015, with 90 "Wikimedia Traffic Analysis Report - Page Views Per Wikipedia Language - Breakdown," May 2016, https://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerLanguageBreakdown.htm#Russian. 91 Amar Toor, "Russia banned Wikipedia because it couldn’t censor pages," The Verge, Aug 27, 2015, http://www.theverge.com/2015/8/27/9210475/russia-wikipedia-ban-censorship. 92 Shaun Walker, "Russia briefly bans Wikipedia over page relating to drug use," The Guardian, Aug 25, 2015, https://www.theguardian.com/world/2015/aug/25/russia-bans-wikipedia-drug-charas-https. https://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerLanguageBreakdown.htm#Russian http://www.theverge.com/2015/8/27/9210475/russia-wikipedia-ban-censorship https://www.theguardian.com/world/2015/aug/25/russia-bans-wikipedia-drug-charas-https 26 Analyzing Accessibility of Wikipedia Projects Around the World INTERNET MONITOR 185 articles dropping off on the eighteenth itself. For many of these articles, traffic did not immediately recover. A graph of four of these articles is below: "Заглавная страница" is the Russian equivalent of English Wikipedia's Main Page, and saw an average of more than 800,000 requests per day prior to this event. The decrease in traffic may have been related to the conflict between Russia and Ukraine that was taking place at the time. The average number of monthly requests to Russian Wikipedia from Ukraine for December 2014 and January 2015 was 53,471.5, while the average for February, March, and April of 2015 was 32,105. 93 We were unable to find any other evidence to support this hypothesis. Russian Wikipedia also contained a relatively large number of anomalies that were limited in scope to single articles. Investigation of many of these cases revealed that in most circumstances, the articles in question were deleted or moved (e.g. "Нагота" ["Nudity"] on July 9, 2014, "Косово" ["Kosovo"] on January 8, 2013, and "Массовое_убийство" ["Mass Murder"] on February 20, 2014). These were picked up by the anomaly detection algorithm, as they often had a significant amount of traffic prior to deletion. After manually removing from analysis those articles that had plausible explanations for traffic drops, we were still left with a number of articles with unexplained significant traffic drops: Start Date Article Translation 2013-05-11 Уэйко Waco 2013-06-02 Камасутра Kamasutra 2013-07-30 Кроманьонцы Cro-Magnon 2013-11-01 Безработица Unemployment 2013-11-14 Анис_обыкновенный Anise Sample graphs of anomalies detected in "Unemployment" and "Kamasutra" are below: 93 "Wikimedia Traffic Analysis Report - Wikipedia Page Views Per Country - Breakdown," Dec 2015 - April 2016, Wikimedia, https://stats.wikimedia.org/archive/squid_reports/2015- 02/SquidReportPageViewsPerCountryBreakdownHuge.htm#Ukraine. https://stats.wikimedia.org/archive/squid_reports/2015-02/SquidReportPageViewsPerCountryBreakdownHuge.htm#Ukraine https://stats.wikimedia.org/archive/squid_reports/2015-02/SquidReportPageViewsPerCountryBreakdownHuge.htm#Ukraine 27 Analyzing Accessibility of Wikipedia Projects Around the World INTERNET MONITOR From our client test node in Russia, we were able to access all Wikipedia project subdomains successfully and reliably. Network request round trip times were the fastest of all we tested: Median RTT Mean RTT Max RTT 96 ms 118.8 ms 688 ms to bxr.wikipedia.org While Russia has actively censored portions of the Internet, and as of June 2016, that censorship appeared to be growing, we found no evidence that Russia was interfering with traffic to Wikipedia at either the article or project level. Saudi Arabia In 2014, Reporters without Borders ranked the Kingdom of Saudi Arabia 164th out of 180 countries in terms of press freedom, emphasizing that the Kingdom is "relentless in its censorship of the Saudi media and the Internet." 94 All international Internet traffic is routed through two national providers, Integrated Telecom Company and Bayanat al-Oula for Network Services, giving the government the ability to review and filter requests. 95 The Communications and Information Technology Commission oversees Internet filtering in the country, and the list of content blocked in the country is long. First, Saudi Arabia uses commercially available software (SmartFilter 96 ) to locate URLs related to pornography, gambling and drugs, which it then blocks. They also maintain a local list of 94 "World Press Freedom Index 2014," Reporters Without Borders, Jan 31, 2014, https://rsf.org/sites/default/files/index2014_en.pdf. 95 "Saudi Arabia: Freedom on the Net 2015," Freedom House, Oct 2015, https://freedomhouse.org/report/freedom- net/2015/saudi-arabia. 96 Jakub Dalek, et al., "A Method for Identifying and Confirming the Use of URL Filtering Products for Censorship," Sigcomm ICM, Oct 2013, http://conferences.sigcomm.org/imc/2013/papers/imc112s-dalekA.pdf. https://rsf.org/sites/default/files/index2014_en.pdf https://freedomhouse.org/report/freedom-net/2015/saudi-arabia https://freedomhouse.org/report/freedom-net/2015/saudi-arabia http://conferences.sigcomm.org/imc/2013/papers/imc112s-dalekA.pdf 28 Analyzing Accessibility of Wikipedia Projects Around the World INTERNET MONITOR URLs separate from this categorization mechanism. 97 This list reportedly contains a broader set of content, including content related to violent extremism, criticism of Gulf royal families, political opposition, censorship circumvention tools, P2P file sharing tools, LGBT issues, human rights organizations, religious scholars (especially those related to the minority Shi'a faith), mirror sites, and unlicensed online publications. 98 99 It is unclear how willing Saudi authorities are to block entire sites over single pieces of content. In 2012, the government threatened to block YouTube if a controversial video was not taken down, but the blocking did not occur because YouTube removed the video in question. 100 Internet restrictions in Saudi Arabia are not limited to content filtering; a 2009 law led to the installation of hidden cameras in all web cafes to track users, and self-censorship among online writers is widespread. 101 The government regularly arrests those who use social media to document human rights abuses, express political opinions critical of the ruling family, or criticize the official religion; those who are convicted are sentenced to jail time and, in at least one case, corporal punishment. 102 In 2006, Saudi Internet users started reporting the censorship of a number of Wikipedia pages in both English and Arabic, mostly related to sexual content. 103 When the blocking occurred, some Saudi citizens felt that some of the pages were unfairly blocked and contained "beneficial" content. 104 Arabic and English Wikipedia together account for more than 95% of the requests from Saudi Arabia, 105 and our analysis did not show the type of traffic anomaly that would be indicative of domain blocking over the period from May 2015 to July 2016. 97 "General Information on Filtering Service," Saudi CITC, http://www.internet.sa/en/general-information-on-filtering- service. 98 "Saudi Arabia: Freedom on the Net 2015," Freedom House, Oct 2015, https://freedomhouse.org/report/freedom- net/2015/saudi-arabia. 99 "Internet Filtering in Saudi Arabia," OpenNet Initiative, Aug 6, 2009, https://opennet.net/research/profiles/saudi- arabia. 100 "YouTube blocks 'Innocence of Muslims' in Saudi Arabia," Al Arabiya News, Sep 19, 2012, http://english.alarabiya.net/articles/2012/09/19/238987.html. 101 "Internet Filtering in Saudi Arabia," OpenNet Initiative, Aug 6, 2009, https://opennet.net/research/profiles/saudi- arabia. 102 Ben Beaumont, "7 Ways Saudi Arabia is Silencing People Online," Amnesty International, Apr 9, 2015. https://www.amnesty.org/en/latest/campaigns/2015/04/7-ways-saudi-arabia-is-silencing-people-online/. 103 "List of Wikipedia articles censored in Saudi Arabia," Wikipedia, https://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedia_articles_censored_in_Saudi_Arabia. 104 Hassana’a Mokhtar, "What is Wrong With Wikpedia," Arab News, Jul 19, 2006, https://web.archive.org/web/20110807060237/http://archive.arabnews.com/?page=1§ion=0&article=85616&d= 19&m=7&y=2006. 105 "Wikimedia Traffic Analysis Report - Wikipedia Page Views Per Country - Breakdown," May 2016, https://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerCountryBreakdown.htm#Saudi Arabia. http://www.internet.sa/en/general-information-on-filtering-service http://www.internet.sa/en/general-information-on-filtering-service https://freedomhouse.org/report/freedom-net/2015/saudi-arabia https://freedomhouse.org/report/freedom-net/2015/saudi-arabia https://opennet.net/research/profiles/saudi-arabia https://opennet.net/research/profiles/saudi-arabia http://english.alarabiya.net/articles/2012/09/19/238987.html https://opennet.net/research/profiles/saudi-arabia https://opennet.net/research/profiles/saudi-arabia https://www.amnesty.org/en/latest/campaigns/2015/04/7-ways-saudi-arabia-is-silencing-people-online/ https://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedia_articles_censored_in_Saudi_Arabia https://web.archive.org/web/20110807060237/http:/archive.arabnews.com/?page=1§ion=0&article=85616&d=19&m=7&y=2006 https://web.archive.org/web/20110807060237/http:/archive.arabnews.com/?page=1§ion=0&article=85616&d=19&m=7&y=2006 https://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerCountryBreakdown.htm#Saudi 29 Analyzing Accessibility of Wikipedia Projects Around the World INTERNET MONITOR We were able to access all Wikipedia subdomains from our client test point in Saudi Arabia, and all round-trip times were within normal ranges: Median RTT Mean RTT Max RTT 283.5 ms 308.6 ms 1142 ms to am.wikipedia.org Our article-level analysis is not segmented by country, and while the largest share of requests to Arabic Wikipedia come from Saudi Arabia, that share is only approximately one-fifth. 106 If we were to locate likely censorship events in Arabic Wikipedia, it would be impossible without additional data to definitively attribute that censorship to Saudi Arabia. Given the results of our client and server data analysis, as of June 2016 we had no firm evidence that Saudi Arabia was censoring any Wikipedia domain or subdomain. South Korea South Korea’s Internet filtering regime is largely focused on its relations with North Korea and on sexually explicit content. The majority of banned websites are North Korean news organizations or sites run by North Korean "sympathizers," but pornography and LGBT websites are also widely banned. 107 The National Security Act in Cyberspace prohibits, among other things, "sympathizing" with North Korea online; more than 100 people were convicted of this crime between 2012 and 2014. 108 South Korea’s constitution states that "neither speech nor the press may violate the honor or rights of other persons nor undermine public morale or social ethics." 109 These restrictions have been used to justify the censoring of attacks against politicians, sites connected to North Korea, and pornography sites. The government’s decision to ban online gaming for six hours each day for citizens younger than sixteen also loosely falls under this guideline. 110 106 "Wikimedia Traffic Analysis Report - Page Views Per Wikipedia Language - Breakdown," May 2016, https://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerLanguageBreakdown.htm#Arabic. 107 "South Korea," OpenNet Initiative, Aug 6, 2012. https://opennet.net/research/profiles/south-korea. 108 Ibid. 109 "Freedom on the Net 2015: South Korea," Freedom House, Oct 2015, https://freedomhouse.org/report/freedom- net/2015/south-korea. 110 "Why South Korea is really an internet dinsosaur," The Economist, Feb 10, 2014, http://www.economist.com/blogs/economist-explains/2014/02/economist-explains-3. https://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerLanguageBreakdown.htm#Arabic https://opennet.net/research/profiles/south-korea https://freedomhouse.org/report/freedom-net/2015/south-korea https://freedomhouse.org/report/freedom-net/2015/south-korea http://www.economist.com/blogs/economist-explains/2014/02/economist-explains-3 30 Analyzing Accessibility of Wikipedia Projects Around the World INTERNET MONITOR The Korean Communications Standards Commission (KCSC) is in charge of regulating the Internet, but in 2014 the Public Prosecutor’s office set up an investigative unit charged with monitoring online slander and rumors. 111 South Korea has a history of defamation cases involving the Internet; in 2012 a National Intelligence Service (NIS) agent removed Twitter accounts that were critical of President Park Geun-hye, who was running for reelection at the time. 112 Just two years later, Han Sun-Kyo, a conservative, attempted to pass a law that would prevent "rumor mongering" in the wake of the capsizing of the Sewol ferry, which left over 300 people dead. 113 Harsh punishments for defamation exist in South Korea; online defamation is penalized severely, with fines reaching $45,000 USD at times. 114 The article-level analysis we conducted revealed some anomalies that could not be attributed to changes to the articles themselves. There were only two anomalies that occurred at approximately the same time: "쁁넁" ("Perineum") and "ꜽ閵덵ꈕ" ("Agnosticism"): While it is interesting that traffic to both articles dropped significantly on the same day, the fact that this anomaly was limited to these two articles and that they are not closely related thematically makes us doubt that this was a censorship event. There were other anomalous events for single articles throughout our analysis, the most significant of which were "꾉덹꿙넱" ("Engine Oil") starting on February 19, 2014 and "껹閵" ("Song of songs") starting on May 17, 2014. We were able to access all Wikipedia project subdomains from our test location in South Korea with no problems. Response times for each of the domains were within typical ranges: 111 "Freedom on the Net 2015: South Korea," Freedom House, Oct 2015, https://freedomhouse.org/report/freedom- net/2015/south-korea. 112 Ibid. 113 Ibid. 114 Ibid. https://freedomhouse.org/report/freedom-net/2015/south-korea https://freedomhouse.org/report/freedom-net/2015/south-korea 31 Analyzing Accessibility of Wikipedia Projects Around the World INTERNET MONITOR Median RTT Mean RTT Max RTT 301 ms 316.6 ms 1286 ms to als.wikipedia.org The history of requests from South Korea to both Korean and English Wikipedias over the period of analysis appear regular with no signs of outages: As of June 2016, we were unable to find any strong evidence that South Korea has censored or was censoring any of Wikipedia's articles or projects. Syria Syrian netizens experience extensive censorship online around politics, minorities, human rights, and foreign affairs. Examples of censored content include the London-based news outlets Al-Quds al- Arabi and Asharq al-Awsat, many Lebanese online newspapers, websites campaigning to end Syrian influence in Lebanon, WhatsApp, the Muslim Brotherhood, websites that advocate for the Kurdish minority, and the entire Israeli top-level domain ".il." Websites related to human rights awareness such as the Violations Documentation Center are also blocked. 115 According to the Wall Street Journal in 2012, out of 2,500 attempts to visit Facebook, two-fifths were permitted and three-fifths were blocked. 116 Censorship also extends to mobile communication: Bloomberg reported in 2012 that a special government unit known as Branch 225 had ordered Syrian mobile providers to block text messages containing words like "revolution" or "demonstration." 117 The fact that both YouTube and some pages on Facebook remain accessible make activists suspect that the current regime is trying to track citizens’ online activities. Other social media applications like the VoIP service Skype suffer from disruptions either due to low speeds or intermittent blocking by the authorities. Over the past decade authorities have detained hundreds of Internet users, including several well-known bloggers and citizen journalists. 118 Wikipedia in Arabic was reportedly blocked from April 2008 until February 2009, but other languages remained accessible. 119 115 "Syria: "Freedom of the Net," Freedom House, May 2015, https://freedomhouse.org/report/freedom-net/2015/syria. 116 Jennifer Valentino-Devries, Paul Sonne, and Nour Malas, "U.S. Firm Acknowledges Syria Uses Its Gear to Block Web," Wall Street Journal, Oct 29, 2011, http://on.wsj.com/t6YI3W. 117 Ben Elgin and Vernon Silver, "Syria Disrupts Text Messages of Protesters With Dublin-Made Equipment," BloombergBusiness, Feb 14, 2012, http://bloom.bg/1i0TOEU. 118 "Syria: "Freedom of the Net," Freedom House, May 2015, https://freedomhouse.org/report/freedom-net/2015/syria. 119 "Syrian Youth Break Through Internet Blocks," IWPR, http://www.css.ethz.ch/content/specialinterest/gess/cis/center-for-securities-studies/en/services/digital- library/articles/article.html/88422. https://freedomhouse.org/report/freedom-net/2015/syria http://on.wsj.com/t6YI3W http://bloom.bg/1i0TOEU https://freedomhouse.org/report/freedom-net/2015/syria http://www.css.ethz.ch/content/specialinterest/gess/cis/center-for-securities-studies/en/services/digital-library/articles/article.html/88422 http://www.css.ethz.ch/content/specialinterest/gess/cis/center-for-securities-studies/en/services/digital-library/articles/article.html/88422 32 Analyzing Accessibility of Wikipedia Projects Around the World INTERNET MONITOR Using our methodology and the data available, article-level censorship would be difficult to attribute to Syria, as Arabic Wikipedia is accessed heavily from many countries. We did not have a client test node in Syria. Our analysis of server-side data detected no significant anomalies in traffic from Syria to any Wikipedia project. Nevertheless, we conducted a manual review of Arabic and English Wikipedia because they are the most popular Wikipedia projects in Syria, together accounting for approximately 98% of traffic. 120 The number of requests to these Wikipedia projects show no significant anomalies between May 2015 and July 2016: While censors