J. E. E. Pettit, Fenggang Yang and Yuki Huang: Developing a Database of Religions in Contemporary China
¶ 1 Leave a comment on paragraph 1 0 Over the past decade, information about religions in China has been piecemeal and has differed greatly in government news and social media. The past few years have seen a rise in the destruction and closing of hundreds of churches, suppression of Tibetan monasteries, and the re-reeducation of Muslim minorities. Yet, it has been difficult to gauge how representative these events at national- or regional-level. Exactly how many churches, mosques, and temples exist in China, and when were these communities founded? Are religious communities increasing in the face of government suppression, or do these news stories indicate a decline or disappearance of religion from the landscape of contemporary China? Are these policies affecting all of China’s religions in the same way, or are certain religions developing differently among China’s distinct cultural regions?
¶ 2 Leave a comment on paragraph 2 0 Our research team at the Center on Religion and Chinese Society (CRCS) at Purdue University developed digital tools as a means to count the religious communities in China. It seemed that digital mapping tools such as Google Maps could enable our team to gain a basic distribution of the sites throughout China, and could determine the changing demographics and its corresponding changes in the religious composition of a country. But while the mapping technologies might open up new datasets on the religion of the United States and other developed nations, the data we could collect on China was limited and difficult to easily form a database of sites. China’s government not only holds a tight grip over the data it collects, but technological giants (e.g., Baidu, Weibo) have provided users with comparatively less access to what Chinese people are thinking and doing with regards for religion.
¶ 3 Leave a comment on paragraph 3 0 As we started surveying the available geographic data on China’s religious communities, we quickly realized that our desire to map out the location of mosques, churches, and temples could have potentially adverse effects not only on our research team, but also the communities we studied. During our survey, changes in official policy towards religious institutions made discussing religions, either in the news or through social media, a potentially dangerous affair. The repressions of religious institutions have been localized in certain cities or provinces, but there is concern that government suppression of religion makes its precarious for groups to make a strong public presence online.
¶ 4 Leave a comment on paragraph 4 0 In 2015, researchers at CRCS embarked on a four-year project to document Chinese religious institutions and develop digital tools to achieve new levels of understanding of where religious establishments are in the country and what parts of the country are witnessing growth or decline of religion. We received a generous grant from the John Templeton Foundation that allowed us to bring together a research team of graduate students and postdoctoral fellows. At first, we thought we might simply fund researchers to visit China and document sites with a GIS-enabled cameras and other equipment. We quickly discovered, however, that researchers who had conducted similar geographic studies of China put themselves in peril. Three British geology students from the Imperial College London, for example, had been fined for making similar geographic surveys in northwestern China. Similar groups of Japanese and Korean scholars incurred heavy penalties for research of geographic places, and one University of Chicago professor was even jailed for conducting research on the geographic positions of oil wells in west China. Despite our great interest to know more about the spatial distribution of religions in China, we were unwilling to place graduate students, affiliated researchers or ourselves in harm’s way. Given our need to document Chinese religions from afar by using digital resources, we needed collaborators who could help us build tools to collect big data about religion. At present, we have culled information of over 150,000 points via an economic census, “points-of-interest” from online mapping services such as Amap, and the State Administration of Religious Affairs (guojia zongjiao shiwuju 国家宗教事务局, hereafter SARA). There would be no way that we could gather this kind of data about religious sites if we were going to collect theme one-by-one. We needed to develop a way that we could process big data automatically and clean problematic points.
¶ 5 Leave a comment on paragraph 5 0 Our desire to map the religious sites of China was more than simply overcoming the mechanical issues of producing accurate local-level data on Chinese religions. We wanted to be sure that our project was not a “commodity to be controlled,” but rather a tool that could be a social good to be shared and reused” by making our dataset public. Our dataset needed to be grounded in computational technologies as well as a foundation from which critical analysis could grow. As Bigenheimer and Elwert demonstrate elsewhere in this volume, accessibility to these kinds of datasets is key in moving the study of religious institutions forward into the future. Yet, releasing information on religious institutions also has the potential to place its community with unwelcome scrutiny from the government. We conclude that datasets on the geographic position of Chinese religious institutions needs to exercise some caution. A researcher would not want to inadvertently place a community at risk. In our dataset, we conclude that there is little risk since the Chinese government produced the information. These considerations about the ethical implications can hopefully assist future generations of scholars who face the tension between public accessibility and personal security.
1. China’s 2004 Economic Census and Religious Sites
¶ 6 Leave a comment on paragraph 6 0 In the People’s Republic of China, five kinds of religions are officially recognized and legally approved by government authorities: Daoism, Buddhism, Catholicism, Protestantism, and Islam. The Chinese government has established a complicated institution to regulate religion. The major control apparatus for religious affairs is the United Front Department (tongzhanbu 统战部) of the Chinese Communist Party (CCP), who makes religious policies and rallying religious leaders around the CCP. The daily administration of religious affairs lies in the Religious Affairs Bureau (RAB), including a national level bureau called SARA. Its duties include processing requests for approving the opening of temples, churches, and mosques; approving special religious gatherings and activities; and approving the appointment of leaders of religious associations. Meanwhile, the Ministry of Public Security (gong’anbu 公安部police) deals with illegal religious activities, and watches some religious groups and active leaders, and the Ministry of State Security (guo’an bu 国安部) monitors religious activities involved with foreigners.
¶ 7 Leave a comment on paragraph 7 0 The presence of these regulatory bodies in China has created a complicated religious field composed of legal and illegal groups, as well as those in-between. In the last several decades, the number of religious believers in China has continuously increased. Moreover, religious believers have engaged in strenuous contentions against the party-state to reclaim religious sites, restore and construct buildings on traditionally religious sites, and sometimes occupy state or collectively owned spaces and construct new religious buildings. As early as 2009, Dr. Shuming Bao from the China Data Center approached us with a dataset extracted from a 2004 Chinese economic census. The file includes 72,887 religious sites from all of China’s 31 provinces, provincial-level regions, and municipalities. From the outset, we knew this list certainly did not represent the locations of all religious sites across the country. First, this economic census only listed a church, temple or mosque that was officially registered with the government. But there are many religious communities in China that could not or choose not to register with the government authorities. In parts of China that we have visited there are certainly more unregistered groups than officially sanctioned ones. Second, given the economic nature of the census, there were a great number of places whose annual income did not register with the authorities as noteworthy for inclusion in this census. And for some northwestern cities of China such as Lanzhou, for instance, the census focuses primarily on mosques that double as economic enterprises. When we visited the city in 2016, we discovered nearly three dozen churches not mentioned in the census, and many of them had existed by 2004 when an economic census was conducted.
¶ 8 Leave a comment on paragraph 8 0 While the 2004 economic census might not give us a comprehensive look at all religious sites in China, we were convinced that this data set would provide scholars with the best available dataset to analyze the spatial distribution of Chinese sites, as well as the names, locations, relative size, and reported annual income for religious communities throughout China. In an effort to map out these sites, we first used the addresses for each site and used a python code to reference the address in Google and Baidu map APIs. Overall, this automatic process was able to correctly identify the addresses for two-thirds of the sites. A close inspection, however, revealed unforeseen issues. Both Google and Baidu maps were using the administrative categories of 2010, but the economic census used an older address system adopted in 2000. This caused over 8,000 addresses to appear as a discrepancy, i.e., a town belonged to one prefecture in 2000, but was identified in a different prefecture.
¶ 9 Leave a comment on paragraph 9 0 As we delved deeper into the addresses and names of the religious sites, we noticed a much more perplexing problem. We found numerous examples where the name of a site indicated that it was located in one village, but the village name in the address was listed as a different location. In site #19238 (a church), for example, the name of the religious establishment is listed as a church in Zhangzhuang Village 张庄村in Zengfumiao Town 增府庙乡 (Changge City长葛市), whereas the address of the same site insisted as located in Niutang Village 牛堂村of Changge City 长葛市. There were hundreds of inconsistencies in the data that needed to be examined on a case-by-case basis. Finally, the Chinese government has made many revisions of the administrative boundaries of counties throughout China. Many of the sites included in the Census might have belonged to one administrative division when the data was collected, but was changed in subsequent years. Sometimes the address was misspelled or it turned out that there were two villages with the same name in the same region.
¶ 10 Leave a comment on paragraph 10 0 Aside from the inconsistencies in the data, we also faced logistical issues in coordinating a team of researchers to clean the data. First, we wanted at least two people on our research team to examine the problematic points as a way to safeguard misunderstandings of the discrepancies in the data. Given the high volume of errors (roughly 14,000), we needed a system for two editors to evaluate the problematic sites, and a third person to judge if the conclusions reached were the same or different. Furthermore, there would be anywhere between 5–10 different editors working on the dataset at any given time. This required a database that would avoid duplication and help automate the process of queuing points for review. Finally, we wanted the final dataset that we released publically to include simplified and traditional characters, as well as English translations. This would increase the accessibility beyond scholars in China, and hopefully make the data intelligible not only for scholars but also journalists, policymakers, and the general public.
2. The Translator Tool
¶ 11 Leave a comment on paragraph 11 0 To solve these issues, we developed a two-step process to prepare the data for public release. The first step involved translating and preparing the data for deep editing; the second was an editor tool that would help us determine the most accurate location possible. For both web applications, we developed them by hosting the data on a server with two parts, a development side where we could test our application and a production side that the research team would use. Each contained three components. First, there was a instead of HTML that stored the source codes of the projects. These parts of the database helped to store the templates for the web interface. If the data for our project were stored in this layer, we might have issues from third parties who wanted to change our data without authorization. To mitigate this problem, we created a DATA layer that stored the PHP files for database connection. After this, we could manipulate the credentials by remote connection via different web editors or IDEs, e.g. Netbeans, PhpStorm. MySQL Database contained the meta data related to this project.
¶ 13 Leave a comment on paragraph 13 0 The top bar of the translator tool allowed the user to navigate between various points. Each time she started the application, it would automatically start with the last point edited. She could navigate to other points in the dataset by either toggling to the “next point” (hou yi ge 后一个) or “previous point” (qian yi ge 前一个). There was also a space where she could enter the unique ID of the point and go directly to that point (tiaozhuan 跳转). The button on the right of the top bar would open a new window to input the common words (see below).
¶ 14 Leave a comment on paragraph 14 0 The grid below the top bar included information that the translator used to identify problematic entries in the dataset. The string of Chinese characters on the left was the address that was automatically generated in Google Maps in 2010. The top-right string of characters was the Chinese name from the economic census and the bottom right was the address from the census. To the left of these strings of characters was a button that could be selected when the automatically generated address was “in conflict” (bu yizhi 不一致) with one another. This information was used to select the points for the editor tool (see below).
¶ 16 Leave a comment on paragraph 16 0 Beneath the grid to compare the address and site name was a tool to automatically generate the name of the establishment in simplified and traditional Chinese, its pinyin romanization, and an English translation. The application would recognize geographic names and remove any national, provincial, prefectural names so that only the place name of a local village or town would appear in the name. In Figure 2, the tool recognized two layers in the original site name, a county (Zhangpu County 漳浦县) and town (Chihu Town 赤湖镇), and removed the first so only one place name would appear. After the name had been properly parsed to its local level, the translator would press the “translate” (fanyi 翻译) button, which would generate the corresponding traditional and pinyin equivalent. The translation of traditional Chinese was based on Microsoft Translator Text API (https://www.microsoft.com/enus/translator/translatorapi.aspx), while the translation of Pinyin is based on a third-party library of Pinyin translation (https://github.com/hotoo/pinyin).
¶ 17 Leave a comment on paragraph 17 0 The English translation for each site was generated by referencing a list of 550 common words that our team identified while working through the data. We identified nearly 600 different kinds of common names used for religious establishments. For example, there are over 50 different words in the Census used to designate Church (e.g., jiaohui 教会, jiaohuidian教会点, jidutang 基督堂, jidujiao huidian基督教会点). These various words would be translated as “Church” and the place name or church would be rendered in pinyin. If we discovered a new word for a temple, church, or mosque, she could select the “common word” button at the top of the page. This would open a new window with a separate database of common words. We implemented this feature with DataTables (https://datatables.net), a powerful table plugin tool for data visualization.
¶ 19 Leave a comment on paragraph 19 0 The translator tool allowed the translator to identify problematic addresses and names, as well as translate each point into four different names (simplified, traditional, pinyin, and English) in roughly 30 seconds. This automated process enabled each point to be processed in 30-60 seconds as opposed to 4-5 minutes required to input all the Chinese and English names by Hand.
3. The Editor Tool
¶ 20 Leave a comment on paragraph 20 0 The second phase of cleaning this data was an Editor Tool (zongjiao xinxi bianji gongju 宗教信息编辑工具 ) that allowed two editors to study problematic points and make the corrections based on existing records, especially the exact location of the religious buildings, as well as the extra village and township information. The landing page for this tool was a national map of China that displayed the working statuses of all Chinese provinces. Only the provinces featured in orange contained data ready for the editors to use and could be clicked. The table is conducted via DataTable. The vector map was conducted via jVectorMap (http://jvectormap.com). This is an open-source platform to display and select shapefiles for countries, provinces, and counties.
¶ 22 Leave a comment on paragraph 22 0 After clicking the item in table or map, a new tab appeared with the main Editor Tool interface. The top left corner of this page displayed the site name, address from the Census, and the automatically generated address. The editor would study these discrepancies and judge where the point should most likely be placed. After researching each point, the editor entered the correct data for the town or village in the top right hand of the page. Based on their findings, the editor could also move the point data on the make by pressing the “shift position” (yidong weizhi 移动位置) button above the map. There was also a search feature on the map where the editor could search for other points in Google or Baidu maps. Each time the editor moved points, the points were stored in three different coordinate systems of Street Map and Satellite Map, and Baidu Map, so that future users could view the point using different kinds of online maps.
¶ 24 Leave a comment on paragraph 24 0 Unlike the simple SQL queries in the previous application, this one proved more difficult since at least two different editors would verify the location for each point. Furthermore, only
¶ 25 Leave a comment on paragraph 25 0 the problematic items would appear in the queue for editors to work. While the SQL query is similar to the one above, we had to add extra columns to record the work of different editors.
¶ 27 Leave a comment on paragraph 27 0 At the time of writing, we reached the end cleaning the data as much as possible. For the census indicated a specific address, we were able to find the corresponding site. Where the census only indicates a village or township name, we were able to find coordinates for the administrative center of that location. We estimate that at least 90 percent of the religious sites in the dataset have points accurate within 5 km of the actual site. Since the maps in this atlas do not go beyond the county-level, we estimate that the accuracy of the geographic points projected in this atlas is much higher. We have made the coordinates for these points, as well as the names and addresses. This information is now available online at Online Spiritual Atlas of China (https://www.globaleast.org). The team is currently working on “publishing” the dataset at Purdue University Research Repository. See the Center on Religion and Chinese Society’s website https://www.purdue.edu/crcs/ for updates on this release.
4. Thoughts on Developing Databases on Contemporary Religion
¶ 28 Leave a comment on paragraph 28 0 While the exact workflow of our project might not be applicable to scholars outside the study of China, we think that the Translator and Editor Tools might help scholars of religions plan and execute the collaborative processing of large, complicated datasets of geographic data. In particular, these kinds of applications are useful as they allow multiple users to simultaneously work on different aspects of cleaning and analysis of the data. This is important because the processing this data would be far too much for one person to do, especially without some kind of way to automatically generate the addresses and translations for the sites.
¶ 29 Leave a comment on paragraph 29 0 In developing the collaborative workspaces, we find four aspects of this project that are relevant for similar studies. First, our use of python code to automate this kind of work is essential to save the time of a research team. If we were to manually look up each point in Google maps or to translate each term in our database by hand, it would take many more months (or years) to deal with large datasets. It is key to always beware that these automated processes will not always be correct. In our case, we had to scan the data to see where the automatically generated addresses or translations failed, but we estimate that we increased our output ten-fold through incorporating such techniques.
¶ 30 Leave a comment on paragraph 30 0 Second, for projects that seek to translate data between different languages, it is helpful to develop an accretive dictionary of ways to translate names. In our dataset, we found nearly 600 common terms for religious establishments. It was beneficial to compile these names as we were translating the data, and it saved a lot of research time when the web application could automatically translate these terms after they had been entered into our common word bank. This was further beneficial as we had multiple translators and editors working with the data. This multi-user dictionary that was specific to our project helped different people adhere to one standard.
¶ 31 Leave a comment on paragraph 31 0 A third feature of our two-part research plan was that we were able to delegate the tasks of cleaning, translating, and collaborating geographic data to different team members. The different applications ensured that team members could work simultaneously on separate aspects of the project. This also encouraged team members to develop specialized tasks within their specific project tasks. Our experience in working with the religious sites from the economic census reinforced the importance of a healthy dialogue between scholars of religions and technical specialists. During this project, sociologists of religion and civil engineers specializing in geospatial technologies came together to develop new tools. Our ability to process the geospatial information and religious terminology would have been impossible without a substantial knowledge on both of these fields. We found that the creation of collaborative web projects exemplifies the need for cross-disciplinary knowledge that bridges humanities with digital knowledge. Such a project, we find, is best found in teams where specialists in different domains work together.
¶ 32 Leave a comment on paragraph 32 0 After cleaning this data, our team has already produced an atlas of Chinese Religions that will be available for purchase by the fall of 2018. We have also produced six articles that use this data, and we will make the entire 2004 Economic Census data available for public download. We hope that this set of data will assist future generations of scholars explore patterns in the general distribution and size of religious institutions in the early 21st century. We see that the public access to such datasets of China is beneficial to scholars, journalists, policymakers, and the public. Such datasets will provide a window into the religious climate in China and enable us to gain a clearer picture of which areas religion thrives and where it has very limited presence.
¶ 33 Leave a comment on paragraph 33 0 When the atlas on Chinese religions is released this fall, we also will make our dataset publically available. Throughout the process, we recognized the potential implications it might have for communities with tense relations to the government. Certainly, if our database was giving the geographic coordinates of illegal or semi-legal institutions, our dataset might be a kind of roadmap to expose a group and put them in harm’s way. The dataset we described in this article is derived from the 2004 Economic Census and thus represents legal entities under PRC law. Obviously, such a dataset only gives data on the officially recognized churches, mosques, and temples. It does not give the full picture of all religions in China since these points would not include any points for illegal or semi-legal groups. But this official layer of data on religions makes for a unparalleled insight into the distribution and frequency of religious sites across China. We expect that this will continue to serve as a foundation for analysis in years to come.
¶ 34 Leave a comment on paragraph 34 0  For more on these events, see Fenggang Yang, Atlas of Religion in China: Social and Geographical Contexts (Leiden: Brill, 2018); Ian Johnson’s The Souls of China: The Return of Religion After Mao (New York: Pantheon, 2017), as well as his writings news articles such as “This Chinese Christian Was Charged With Trying to Subvert the State,” The New York Times, March 25, 2019.
¶ 36 Leave a comment on paragraph 36 0  Dingding Xin, “Unlawful Surveys to be Dealt Severely,” China Daily (March 7, 2007). For more on the dangers of collecting information of religious sites, see Fenggang Yang, “The Failure of the Campaign to Demolish Church Crosses in Zhejiang Province, 2013–2016: A Temporal and Spatial Analysis,” Review of Religion and Chinese Society 5.1 (2018): 5–25.
¶ 37 Leave a comment on paragraph 37 0  Lisa Spiro, “‘This Is Why We Fight:’ Defining the Values of the Digital Humanities,” In Debates in the Digital Humanities, ed. Matthew Gold. (University of Minnesota Press, 2012) 22.
¶ 39 Leave a comment on paragraph 39 0  The census has uneven measurements from China’s different provinces. For more on these complexities, see Carsten A. Holz, “China’s 2004 Economic Census and 2006 Benchmark Revision of GDP Statistics: More Questions than Answers?,” The China Quarterly 193 (March 2008): 154–156.