¶ 1 Leave a comment on paragraph 1 0 Buddhists have never hesitated to embrace new forms of communication. The earliest Indian epigraphy (3rd century BCE), and the earliest manuscript fragments in Indian languages (1st century BCE/CE) are connected to Buddhism. The earliest extant printed book, dated 868 CE, is a Chinese translation of the Diamond Sutra. Throughout its history Buddhism has used whatever means available to encode, disseminate, and maintain its growing corpus. Buddhist texts were first composed and transmitted in India in a cultural environment that valued the mnemonic techniques of oral transmission. Later Buddhists became eager “early adopters” of two other emerging information technologies – writing and printing. Below I address the shift of Buddhist heritage information into the digital under three main headings: the digitization of Buddhist texts and images, the digitization of scholarly tools (dictionaries, bibliographies etc.), and the application of computational methods on those data.
1. Data: Digitization of Primary Sources
1.1 Digital editions of canonical texts
¶ 2 Leave a comment on paragraph 2 0 Through the centuries Buddhists have managed their shifting corpora via the changing media of oral, handwritten, printed, and now digital text. Part of the conceptual apparatus for these endeavors is the notion of canonicity, a central and early concern for Buddhists. Although there is no single, stable Buddhist canon that is used in all traditions, the concept of canonicity, both fluid and robust, has played an important role in shaping how Buddhists perceive of their textual heritage. It is thus not surprising that first efforts were aimed at producing digital editions of the “canon.” Most of the digital canonical editions were created independently from each other, and as a result we have several overlapping versions of the Pāli, Chinese, and Tibetan canon. These are often modeled on different print editions and are in various ways emended, some more, some less transparently so.
¶ 3 Leave a comment on paragraph 3 0 The Pāli Canon exists in three major independent digital versions most of which were created in the late 1980s and 1990s. These have been copied across the net, often with minor changes along the way. As a result, digital Pāli texts are easy to find online, but their provenance and editorial standards are often undefined. This makes them difficult to cite and to rely on for philological research.
¶ 4 Leave a comment on paragraph 4 0 Perhaps the most influential digital edition of Pāli Buddhist texts is the final CD version (Ver. 3) of the Chaṭṭha Saṅgāyana Edition that was published by the Vipassana Research Institute (VRI) in late 1999. As the name Chaṭṭha Saṅgāyana implies, the VRI corpus is a digitization of the printed canon as redacted by the sixth council that was held in Yangon from 1954 to 1956. The strengths of the VRI corpus are that the texts have been proofread, and that it alone among digital editions of the Pāli canon includes the commentaries (aṭṭhakathā) and sub-commentaries (ṭika). Markup links connect the commentaries to the mula text, making it possible to build interfaces that present the mula together with two layers of commentaries.
¶ 5 Leave a comment on paragraph 5 0 It is unclear whether or in how far the online texts currently available on the VRI website, called Chaṭṭha Saṅgāyana Tripitaka Ver. 4.0, were edited beyond the last Chaṭṭha Saṅgāyana CD (Ver. 3) version. As with all digital editions of the Pāli canon there is lack of technical documentation, or indeed any documentation or meaningful metadata. Digital editions need, like their print counterparts, information as to who created the resource, when and where, and what editorial decisions were made (and why) in converting the printed into a digital text. Development on the VRI corpus seems to have stopped some years ago, though a search engine for the corpus (Windows only) and an iPhone app has been made available. These days the best way to use the VRI corpus is via a browser extension called Digital Pāli Reader that is developed and maintained by Yuttadhammo Bhikkhu.
¶ 6 Leave a comment on paragraph 6 0 A second digital edition of the Pāli canon is the Sri Lankan Buddha Jayanti Tripitaka Project, which has digitized the Pāli Canon from the government sponsored Sinhalese Buddha Jayanti edition (1956-1990), an edition that was created partly in response to the Burmese Chaṭṭha Saṅgāyana. The digital version of the Buddha Jayanti corpus, seems less well proofread than the VRI corpus, but it too has been available since the 1990s, and can be found on various websites. Next to the core texts of the Pāli Tripiṭaka, the Buddha Jayanti corpus comprises a small number of paracanonical and commentarial works, as well as texts on history, grammar and rhetoric. One stable way of accessing the Buddha Jayanti corpus is via the Göttingen Register of Electronic Texts in Indian Languages (GRETIL) (see below).
¶ 7 Leave a comment on paragraph 7 0 The third digital Pāli corpus, still hardly noticed by the scholarly community, is the release of the Pāli Text Society edition online under a CC License via GRETIL. The digitization is the result of a collaboration between the PTS and the Dhammakaya Foundation in Thailand between 1989 and 1996. The original aim was, as so often in the 1990s, to produce a CD. After two CD versions, this line of distribution was discontinued, and in 2014 the texts were released on GRETIL. The digital PTS corpus so far consists only of the Pāli Vinaya, Sutta and Abhidharma, none of the commentarial and paracanonical works from the PTS print series seem currently available digitally.
¶ 8 Leave a comment on paragraph 8 0 To date (June 2019), the files on GRETIL contain the contradictory copyright notice: “This file is (C) Copyright the Pali Text Society and the Dhammakaya Foundation, 2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.” Moreover, the PTS files in GRETIL contain the following disclaimer: “These files are provided by courtesy of the Pali Text Society for scholarly purposes only. In principle they represent a digital edition (without revision or correction) of the printed editions of the complete set of Pali canonical texts published by the PTS. While they have been subject to a process of checking, it should not be assumed that there is no divergence from the printed editions and it is strongly recommended that they are checked against the printed editions before quoting.”
¶ 9 Leave a comment on paragraph 9 0 Working with digital Pāli text, at this stage the recommendation is to search the VRI Pāli canon via Yuttodhamma’s browser extension in order to have full access to the commentarial strata, then use the PTS editions in print or pdf to corroborate difficult or doubtful passages.
¶ 10 Leave a comment on paragraph 10 0 For simple searches with convenient access to translations and parallels it is best to use the SuttaCentral website (see Sec. 2.2), which hosts emended versions of the VRI corpus, and might at one point add commentarial literature.
¶ 11 Leave a comment on paragraph 11 0 The history and dynamics of Buddhist canonical collections in Chinese has long been studied, especially in Japan. The production of digital Chinese Buddhist texts was and is slightly more challenging than for texts in Indian languages or Tibetan, because Chinese cannot be presented meaningfully in alphabetic transcription. For a long time the rendering of Chinese character variants was an endemic problem for Chinese digital text. Only the advent of Unicode in 1993, and especially the addition of Extension B in 2001, put an end to the confusion of encoding systems and normalization strategies that have plagued the digitization of Chinese ancient text.
¶ 12 Leave a comment on paragraph 12 0 Like with the Pāli and Tibetan corpora, different organizations have produced independent digital editions of Chinese Buddhist texts, often framed around a specific canonical edition. The two main collections to date are the Chinese Buddhist Electronic Text Association (CBETA) corpus, and the SAT Daizōkyō Text Database (SAT for Saṃgaṇkīkṛtaṃ Taiśotripiṭakaṃ “Society for the Creation of the Taishō Tripiṭaka”). Both started out collaboratively as projects to digitize the Taishō Canon, the canonical edition of the Chinese Buddhist Canon that, published between 1924 and 1934, quickly became authoritative for East Asian Buddhist Studies.
¶ 13 Leave a comment on paragraph 13 0 CBETA was founded in Taipei in 1998 in emulation of the Pāli Text Society, with the aim of providing reliable digital versions of Buddhist texts to a user community that comprised Buddhist believers as well as researchers in Buddhist Studies. CBETA always considered its main task to provide accurate Chinese Buddhist texts in many different formats. The texts can be read online at www.cbeta.org, but are also available for download in multiple formats (epub, mobi, pdf etc.) for different devices and applications. Before the advent of the mobile versions most users availed themselves to the (Windows only) CD version (current: Version 2018) with the dedicated CB Reader search engine. A sophisticated online interface for researchers, with some analytic functions, has been released in 2014.
¶ 14 Leave a comment on paragraph 14 0 CBETA published its first CD version containing Vols. 1-55 and 85 of the Taishō Canon in 1999. Since then it has successively added texts from numerous other canonical editions, most importantly the Manji shinsan zoku zōkyō 卍新纂續藏經 (Tokyo, 1905-1912), which contains 1230 Chinese Buddhist texts that were not part of the Taishō canon. In addition, the CBETA corpus comprises 285 texts from the Jiaxing Canon 嘉興大藏經, some 250 Buddhist temple gazetteers, and other texts not included elsewhere from various sources and editions. The 2018 version contained 4,621 texts.
¶ 15 Leave a comment on paragraph 15 0 Crucial for researchers in the digital humanities is that CBETA provides its corpus in XML/TEI format on github. This is currently the most comprehensive and usable collection of open access Chinese Buddhist texts for the application of corpus linguistics and related forms of analysis.
¶ 16 Leave a comment on paragraph 16 0 The SAT corpus, maintained and hosted by a team at Tokyo University, is currently accessible via its websites, the most recent version of which is dated 2018. The interface allows for searches of all or selected sections of the Taishō canon (3283 texts) as well as the Jōdoshū zensho 浄土宗全書, the “Collected Works of the Pure Land School.” Especially helpful for translators is the linking of highlighted texts with the Digital Dictionary of Buddhism (see Section 2.1.), and to the corpus of English translations in the Bukkyō dendō kyōkai (BDK) corpus of translations. The interface also allows to view scans of the print edition, which is often helpful, especially when dealing with Siddham script, illustrations or variant characters.
¶ 17 Leave a comment on paragraph 17 0 Another recent contribution is a search interface for the heavily illustrated volumes 86-96 of the Taishō. The images can be searched by keywords, magnified, and tagged, and are published according to the IIIF standard. Considering the dearth of open data regarding Buddhist Art, this is a welcome addition.
¶ 18 Leave a comment on paragraph 18 0 CBETA and SAT are often seen as giving access to the same data, because both started out (and collaborated) on digitizing the Taishō. However, their text base has diverged over the last fifteen years and the actual overlap of searchable text consists only of about 2270 texts (Taishō Vols. 1-55 & Vol. 85), which is most of the Indian scriptures translated into Chinese, and the works composed in Chinese until c. the 8th century by Chinese, Korean and Japanese Buddhists. Those texts are nearly identical in both corpora, as both rely on the Taishō, however, the CBETA corpus offers revised punctuation for many texts, and has expanded the apparatus. It also provides (transparently marked) emendations where the CBETA editors judge the Taishō text to be erroneous and the mistake does not become apparent in an apparatus entry.
¶ 19 Leave a comment on paragraph 19 0 In addition to the 2270 texts shared between the SAT and the CBETA corpus, the SAT interface searches another c. 1220 Buddhist texts from the Taishō canon (Vols. 56-84 and Vols. 86-97), which are not contained in the CBETA corpus. These are mostly texts by Japanese authors written after the 8th century, the majority composed in Buddhist Chinese. In contrast, the CBETA corpus contains another c. 2351 texts from various sources, which are not accessible through the SAT website. These are mostly texts by Chinese authors and written after c. the 8th century.
¶ 20 Leave a comment on paragraph 20 0 As a rule of thumb, whoever studies Japanese Buddhism (or the Japanese commentarial tradition on Indian and Chinese texts) ought to work with the SAT website. For research on Chinese Buddhism one should make use of the latest version of the CBETA corpus.
¶ 21 Leave a comment on paragraph 21 0 To date, the main difference between the two projects from a DH perspective is that CBETA aims to produce a wide ranging corpus of Chinese Buddhist texts, and distributes these texts in various formats under a CC license, whereas SAT aims to provide an online research platform for the Taishō edition of the Chinese canon.
¶ 22 Leave a comment on paragraph 22 0 A third digital project regarding the Chinese canon is the The Tripitaka Koreana Knowledgebase Project by the Research Institute of the Tripitaka Koreana (Seoul). Like SAT it is an attempt to carefully model one particular edition, in this case the first and second printing of the Korean edition of the Chinese canon. The project, however, seems to have been dormant for some years now. The website has been offline at times in the past, but is currently (June 2019) accessible, albeit in dire need of maintenance and internationalization. The Tripitaka Koreana Knowledgebase at one point offered scans of the surviving portions of the first printing of the canon, and could be an important resource for research into the printing history of the Buddhist canon, but in its current stage is difficult to use, at least via the English interface.
¶ 23 Leave a comment on paragraph 23 0 Various groups have worked on electronic editions of Tibetan canonical collections and the “Collected Works” (gsung ‘bum) by later authors. The two most widely used digital collections are the Asian Classics Input Project (ACIP), and the Buddhist Digital Resource Center (BDRC) (formerly Tibetan Buddhist Resource Center).
¶ 24 Leave a comment on paragraph 24 0 Since 1987 the ACIP has produced distributables in plain-text format of the Kangyur (bka’ ‘gyur), the Tangyur (bstan ‘gyur) and various Sungbum collections. The text entry of ACIP was accomplished by involving Tibetan communities in India, Tibet and Mongolia. The produced texts lack metadata about the editions from which they were created, which limits their usability for research on the level of individual texts. As a whole, however, the plain-text corpus might be used for corpus linguistic analysis and related forms of research.
¶ 25 Leave a comment on paragraph 25 0 BDRC was founded as the Tibetan Buddhist Resource Center in 1999 by E. Gene Smith (1936-2010), who dedicated his life to the preservation and dissemination of Tibetan texts. BDRC has digitized, cataloged, and archived a large number of culturally significant works, securing the once critically endangered Tibetan literary corpus and making it widely accessible. Smith’s effort counts among the great success stories in cultural heritage preservation. Most works are distributed under a Creative Commons license. A few texts are restricted based on cultural commitments to stakeholders.
¶ 26 Leave a comment on paragraph 26 0 Some texts contained in the BDRC corpus are distributed as scans (in PDF) from original documents, others are available as full text. Both come with metadata, which helps to trace provenance, and makes them usable for research on the level of individual texts. Currently, the website allows download only of single texts and even for this a user account is needed. However, the BDRC team is ready to consider requests in case full-text access is needed for DH related analysis.
¶ 27 Leave a comment on paragraph 27 0 In 2015, the Board of Directors voted to broaden the Center’s preservation mandate to include texts in languages beyond Tibetan, including, among others, Sanskrit, Chinese, and Pāli. To reflect its expanded mission, the Center’s name was changed to Buddhist Digital Resource Center.
¶ 28 Leave a comment on paragraph 28 0 Many researchers in Tibetan Buddhism make use of both the ACIP and the BDRC collection in one form or another. They often search for material via Paul Hackett’s Buddhist Canons Research Database (see Sec. 2.2).
¶ 29 Leave a comment on paragraph 29 0 Still another Tibetan canon project is less well known, but deserves more attention. The “Resources for Kanjur & Tanjur Studies” (rKTs) provides both transcriptions and scanned images of a range of printed and handwritten editions via a minimalist website. It is part of the Tibetan Manuscripts Project Vienna (TMPV) directed by Helmut Tauscher and a good place for online research into the edition history of the Tibetan canon.
¶ 30 Leave a comment on paragraph 30 0 Though one or more canonical collections in Sanskrit might have existed in India at one point, none have survived in toto, and no Sanskrit “canon” as such has been translated as a whole. Nevertheless, a great wealth of Buddhist Sanskrit texts have survived, often complete in the monasteries of Nepal and Tibet, sometimes fragmentary in the sands of South and Central Asia.
¶ 31 Leave a comment on paragraph 31 0 GRETIL (the “Göttingen Register of Electronic Texts in Indian Languages and related Indological materials from Central and Southeast Asia”) is among the oldest and best managed repositories for digital Indian texts. Safely hosted by the Niedersächsische Staats- und Universitätsbibliothek Göttingen, the formidable team has early on avoided the principal mistake of digital resources (over-reliance on interface) and distributed well-curated text files with basic metadata. Conceived as a repository for Indology in general, GRETIL has become a valuable resource for digital textual studies of Indian and Pāli Buddhism. The main file format is HTML with some metadata information at the beginning of each file. Longer texts are often split in several files. To date it contains some 250 Buddhist Sanskrit texts that can be downloaded in one single zip-archive. In another major contribution GRETIL recently (2017) has added digitized versions of fourteen Sanskrit dictionaries in CSX format, allowing for aggregated search in platforms such as GoldenDict.
¶ 32 Leave a comment on paragraph 32 0 The other Buddhist Sanskrit project of note is the Digital Sanskrit Buddhist Canon, since 2003 maintained by the University of the West. The digital texts are produced from available print editions at the Nāgārjuna Institute of Buddhist Studies in Kathmandu. Over the years, the Nāgārjuna Institute has assembled an important collection of texts that are otherwise difficult to find. The texts have been proofread, have basic metadata associated, and are made available through the project website. The interface, however, is lacking in faceted search and, as so often, does not offer the data archived for download. Fortunately, many of the texts are shared with permission via GRETIL, from which users can assemble their own collections.
¶ 33 Leave a comment on paragraph 33 0 Though there is a considerable overlap between GRETIL and the Digital Sanskrit Buddhist Canon there are texts which are unique to either of the repositories. At this stage, the recommendation is to get the latest version of the Buddhist Sanskrit texts from GRETIL and search the files via a text editor or a grep-like command line tool. In addition, one should search the Digital Sanskrit Buddhist Canon.
1.2. Thematic Collections
¶ 34 Leave a comment on paragraph 34 0 Next to collections of “canonical” texts – however defined – there are a number of important digital collections that do not work with the canon as a primary category for organizing text, but which are centered in various ways on geographic regions, single material collections, topics or genres.
¶ 35 Leave a comment on paragraph 35 0 The Huntington Photographic Archive of Buddhist and Asian Art is the largest independent archive of Buddhist Art. It represents the field documentation efforts by John and Susan Huntington from 1969 to the present. The archive was first established at Ohio State University in 1986. The collection is currently being accessioned by the University of Chicago Libraries, where it will be permanently housed and maintained.
¶ 36 Leave a comment on paragraph 36 0 The material Huntington Archive contains more than 250,000 original slides and photographs documenting the artistic traditions of Asia from ancient to modern times. The collection emphasizes Buddhist material, but also includes significant holdings on Asian art in general. Currently, some 60,000 photographs are available online, with the goal to have all remaining images online by 2020. In addition to the image database, the Archive’s website includes online exhibitions and educational materials on Asia and Asian art, including useful maps. The database is currently transitioning its metadata to the VRA Core (Ver. 4) standard that will help to unify the Archive’s terminology and classification system.
¶ 37 Leave a comment on paragraph 37 0 The data is so far not published under an open license, but made available to researchers online without charge. To reproduce the images in publications, researchers still need permission from the archive and fees might apply.
¶ 38 Leave a comment on paragraph 38 0 The Tibetan and Himalayan Library (THL) was begun in 2000 under the leadership of David Germano and was one of the first large online collections of scholarly information about Tibet. Today the THL is designed, according to its website, as “a publisher of websites, information services, and networking facilities relating to the Tibetan plateau and southern Himalayan regions.” Its interface provides access to a large collection of c. 70,000 photographs, audio and visual material, a map collection, and Tibetan language tools. Among these the “THL Tibetan to English Translation Tool” is especially noteworthy.
¶ 39 Leave a comment on paragraph 39 0 Tied to the canonical catalogs are helpful bibliographies of secondary literature for many texts in the canon. The THL is designed as online library and offers no distributable data. The emphasis is on linking historical and textual information to be accessible online in the THL interface.
¶ 40 Leave a comment on paragraph 40 0 Another project that provides image data is the International Dunhuang Project (IDP) that aims at making the richness of Central Asian manuscripts available. These manuscripts are indispensable for the study of late Indian, medieval Chinese, and early Tibetan Buddhism. IDP was established in 1994 to coordinate international teams of conservators, catalogers, researchers and digitization professionals to ensure the preservation of the Eastern Silk Road collections and to make them freely available online. Hosted by the British Library, IDP has brought together collections and stakeholders from the UK, France, Germany, Russia, China and Japan. Although not all collections have been fully digitized so far (about 30% of the Stein collection remains unscanned), and not all that is digitized is released, much has been made available and is distributed via the IDP website. IDP currently offers access to over half a million images of over 100,000 manuscripts, paintings, artifacts, and photographs.
¶ 41 Leave a comment on paragraph 41 0 Also focused on manuscripts is the Digital Library of Lao Manuscripts, which aims to preserve the rich heritage of Laotian Buddhist manuscripts (15th to 20th century). It contains images of c. 12,000 texts, which are findable by title, ancillary term, language, script, category, material, location, and date via an exemplary faceted search function. The data is the happy result of the Preservation of Lao Manuscripts Programme of the Lao Ministry of Information & Culture, which was supported by the German Ministry of Foreign Affairs from 1992 until 2004. According to the website the criteria for the selection for microfilming were “historico-cultural importance, cultural diversity or regional representation, age (all manuscripts over 150 years old) and quality of the manuscript. Within these general guidelines, priority for microfilming was given to extra-canonical literature, all manuscripts which were thought to represent indigenous literary traditions, and all texts of a non-religious nature.” The texts were originally preserved in microfilm format, but have been digitized and are now made available both via an online interface and packaged with professional metadata for download in pdf format.
¶ 42 Leave a comment on paragraph 42 0 In Taiwan, there are the projects conducted at the Library and Information Center of the Dharma Drum Institute of Liberal Arts. Over the last fifteen years a steady series of some twenty projects have produced open data on various aspects of Buddhist culture. To mention only five:
- ¶ 43 Leave a comment on paragraph 43 0
- Among the larger projects at the Library and Information Center was a visualization platform for Gaoseng zhuan collections, i.e. Buddhist biographical literature, that allowed users to explore the information in different views, e.g. on a map or as social network. The social network information that was produced during this project is the largest of its kind in Buddhist Studies.
- The Digital Archive of Buddhist Temple Gazetteers resulted in the full-text digitization of some 250 local histories of Buddhist temples, which are important sources for the study of Buddhism in late imperial China. Data & metadata is published in the form of METS archives, the full text is encoded in XML-TEI.
- The Catalog Database of Republican Era Buddhist Journals is an detailed online catalog of two large print collections of Buddhist periodicals published between 1912 and 1950. It allows for searches by topic and genre.
- – Buddhist Temples in Taiwan is a geo-referenced dataset of c. 5500 temple in Taiwan, which includes historical and religious information about most sites. Data & metadata is published in XML. An online interface visualizes the distribution of temples on a time line. The data is joined with the largest image database of temples on Taiwan.
- The Buddhist Authority Databases were designed to provide authority data for various projects at Dharma Drum that needed to disambiguate the names of persons and places, text titles, and East Asian calendar dates. The data is made available in XML in packaged archives which are updated on a monthly basis. There is also an open API for outside projects that wish to work with linked data. As of 2018, the Dharma Drum Buddhist Person Authority, which is continually developed and expanded, is the largest digital onomasticon for Buddhist Studies.
¶ 44 Leave a comment on paragraph 44 0 Next to long term projects that enjoy strong institutional support, sometimes outstanding data collections are provide by individual researchers. Special mention should be made of the well-curated Ancient Buddhist Texts of Venerable Ānandajoti, who has prepared a number of annotated digital editions of Pāli and Sanskrit texts. All the material, which contains rare works from the grammatical tradition, is available via his website and downloadable in pdf, epub and mobi format. Value is added to many of the better known sūtra texts by the annotation. The material is well packaged, and, one would hope, will one day find its way into a long-term archive facility such as Zenodo.
¶ 45 Leave a comment on paragraph 45 0 Whether individual or institutional projects, long term sustainability is always an issue for digital resources; as is the creation of an environment for collaborative research. Thankfully, data producers are increasingly aware of this and Dharma Drum, Ancient Buddhist Texts, Suttacentral and many others make archival packages of their data available. More and more open Buddhist data can be found archived on Github and other version controlled platforms. Such repositories open the possibility of new forms of collaborative research, e.g. the joint development of digital editions.
¶ 46 Leave a comment on paragraph 46 0 One repository for premodern Chinese texts that is designed along those lines is Kanripo, developed since 2013 by Christian Wittern. The repository combines texts produced in-house with texts collected from other projects and sources on the internet. It combines the large Siku quanshu 四庫全書collections, which contain the output of 2500 years of Confucian literati culture, the Daoist canon and the CBETA corpus, thus allowing to research terminology across genres and traditions. The total number of texts in Kanripo is currently close to 10,000 individual items. Texts that are available multiple times in these collections are consolidated into one entry and, where available, digital facsimiles of the text are juxtaposed with the full text. The texts are released under a Creative Commons (BY-SA) license. The collection is working with users who want to add texts to the repository and works towards a “sinological common” that can provide reliable sources for research.
¶ 47 Leave a comment on paragraph 47 0 Most of the resources mentioned so far concentrate on collecting and presenting texts in only one Buddhist language. However, in Buddhist Studies almost every text is a cluster of texts and everyday research practice often consists in comparing different translations. To prepare aligned text that assists with such comparisons is time consuming, and only a few multi-lingual digital editions have been attempted so far. The largest project is the Thesaurus Literaturae Buddhicae (TLB), which was developed by Jens Braarvig as part of the Bibliotheca Polyglotta. The website presents dozens of multilingual Buddhist text clusters in Sanskrit, Tibetan, Chinese and English. The texts are chunked in (loosely defined) sentence or paragraph units. So far the data is limited to online use, one would hope that the dataset of linked text could one day be made available for download. The alignment of sentence-size chunks would be very helpful for e.g. the computational analysis of translation vocabulary.
¶ 48 Leave a comment on paragraph 48 0 For a Chinese Āgama text (T.100) a detailed, aligned TEI edition with all Chinese, Pāli, Sanskrit and Tibetan parallels has been prepared for a project at Dharma Drum. Similarly Chinese, Sanskrit and Tibetan versions of the Yogācārabhūmi are available in a dedicated interface.
2. Digital Tools for Scholarship
¶ 49 Leave a comment on paragraph 49 0 After the digitization of primary sources, the other obvious target of digitization were structured research tools such as dictionaries, catalogs, and bibliographies, which could be modeled relatively easily as databases.
¶ 50 Leave a comment on paragraph 50 0 The multilingual character of the Buddhist tradition has resulted in a large number of dictionaries, glossaries and encyclopedias. Lexicography is not merely an concern for modern Buddhist Studies, but has been part of Buddhist scholasticism for centuries.
¶ 51 Leave a comment on paragraph 51 0 While, on the one hand, dictionaries originally designed for print have been digitized, one of the most widely used online dictionaries of Eastern Buddhism is the “digital-native” Digital Dictionary of Buddhism (DDB), an original and innovative creation by Charles Muller. Started as a private dictionary lookup tool in the late 1980s, Muller took the DDB online in the 1990s making it one of the earliest surviving online tools for Buddhist Studies. Conceived of as a collaborative project, contributions of various sizes and types have since been made by over 300 scholars. For each Chinese lemma, the DDB provides an array of definitions, which are individually credited to their contributors. It offers pronunciations in Mandarin, Korean, Japanese and Vietnamese, as well as pointers to print dictionaries that contain the lemma. The DDB has incorporated the Soothill-Hodous Dictionary of Chinese Buddhist Terms and Lewis Lancaster’s The Korean Buddhist Canon: A Descriptive Catalogue and, as of 2019, contains c. 72,000 entries. The DDB is a subscription service, but visitors can query up to ten terms per day by using a generic log-in (“guest”) without password.
¶ 52 Leave a comment on paragraph 52 0 Another dictionary that was created digitally by and for Buddhist scholars is the Dictionary of Gāndhārī by Stefan Baums and Andrew Glass. As of 2016, this deeply erudite work had c. 6700 entries. It is based on the growing corpus of manuscripts written in the Gāndhārī prakrit that was used in northwestern India, Pakistan and Central Asia between c. 300 BCE and 400 CE. The entries provide a gateway to editions of some of the earliest surviving Buddhist texts. The data is currently not available outside of the interface.
¶ 53 Leave a comment on paragraph 53 0 For those who prefer to use offline lookup tools, currently the best solution is to use a dictionary platform such as GoldenDict or Babylon and add Buddhist dictionaries and glossaries in formats such as StarDict, Babl, or CSX, many of which are available on the web. The DILA collection of Glossaries for Buddhist Studies alone provides fourteen free Buddhist dictionaries and glossaries, which can be searched simultaneously, e.g. in GoldenDict. Together with the newly digitized Sanskrit dictionaries available through GRETIL, concurrent search across a large number of digitized dictionaries is a convenient way to get an overview of the semantics of a term.
¶ 54 Leave a comment on paragraph 54 0 However, whether online or offline, digital lexicography has drawbacks for learners. Dictionary platforms reduce a dictionary to its entries, i.e. users can rarely consult the editorial principles laid out in the introductions, or even access its list of abbreviations. Working with a dictionary one needs to understand its editorial intent and organizing principles, which are obscured when dozens of entries from different works appear in the same interface. Nevertheless, the aggregation of dictionary data from dozens of reference works into concurrent lookup tools saves much time and shelf space.
¶ 55 Leave a comment on paragraph 55 0 The importance of “sutra catalogs” for the study of Buddhism is important especially in the world of Chinese and Tibetan Buddhism where canonical editions were printed in large sets and kept in temples and libraries. Without a catalog as finding aid such editions would have been unusable. In the 20th century, researchers have created meta-catalogs which establish the connections between the Pāli, Sanskrit, Chinese, and Tibetan corpora. In the digital realm the two largest of these, which have incorporated many of the printed catalogs, are the Buddhist Canon Research Database, which specializes in Tibetan and Sanskrit material, and SuttaCentral, which specializes in early Buddhist texts (i.e. roughly the content of the Pāli canon and its parallels).
¶ 56 Leave a comment on paragraph 56 0 The Buddhist Canon Research Database was started by Paul Hackett in the mid-1990s as a digital catalog to the Tibetan Buddhist canon with full-text searching and linked dictionary look-up. In 2010, the collection catalog, together with a bibliography of secondary literature, and accompanied by hypertext links to online resources, was made publicly available on servers at Columbia University. The following year, localization of the interface for nine different languages was implemented, together with the addition of bibliographic data for select sets of indigenous Tibetan commentaries. In 2013, an updated interface with full-text search capability accessing the full text of the complete Tibetan canon was added.
¶ 57 Leave a comment on paragraph 57 0 At present, the online resource contains approximately 10,000 bibliographic records for primary texts, with another 12,000 bibliographic records for the associated secondary literature, while the full-text interface offers search features for a large corpus of Tibetan canonical texts. The database is maintained and continues to be developed, but is so far not shared beyond the interface.
¶ 58 Leave a comment on paragraph 58 0 Under the leadership of Venerable Sujato the team at SuttaCentral has assembled a comprehensive database of early Buddhist texts, linking texts in Pāli, Chinese, Sanskrit and Tibetan with modern translations in a delightful richness of languages that includes not only the usual suspects, but also Malay/Indonesian, Vietnamese, Korean, Czech, Hungarian and many more.
¶ 59 Leave a comment on paragraph 59 0 SuttaCentral was founded in 2005 by Bhikkhu Sujato, Rod Bucknell, and John Kelly as a web service for the sutta parallel tables developed by Rod Bucknell and Venerable Anālayo. As of 2017, SuttaCentral lists nearly 50,000 parallels and hosts over 60,000 texts in 39 languages. It also provides several dictionaries, including an ongoing revision of Buddhadatta’s concise dictionary (now as the New Concise Pāli-English Dictionary).
¶ 60 Leave a comment on paragraph 60 0 Most of the translations have originally been adapted from pre-existing, open access work, but new translations are now being prepared specifically for the site. A new translation of the four Pāli nikāyas has been completed by Bhikkhu Sujato and is scheduled to be published in 2018. In addition, a new translation of the Pāli Vinaya by Bhikkhu Brahmali is underway and is being progressively added to the site. These new translations use a segmented approach, in which text and translation are matched segment by segment, resulting in addressable passages. This design can support a new generation of semi-automated and/or crowd sourced translations and, if widely adopted, could lead to a new canonical reference system.
¶ 61 Leave a comment on paragraph 61 0 Material on SuttaCentral is covered by a variety of licensing conditions. The original texts are in the public domain. Legacy translations are available under a variety of licenses, mostly permitting non-commercial use. New translations are dedicated to the public domain via Creative Commons Zero. All data and software is freely available on Github, packaged in an exemplary fashion. SuttaCentral encourages reuse and copying of its data.
¶ 62 Leave a comment on paragraph 62 0 Both the Buddhist Canon Research Database and SuttaCentral are aggregate sites that have grown out of comparative catalogs. They have incorporated large amounts of digitally available material, improved accessibility, and continue to curate the data by adding links and corrections. A long term trend seems to be that projects that started out as digitization of canonical editions like CBETA and BDRC tend to add on catalogs and multilingual linking as they develop, while projects that start out as cataloging databases such as SuttaCentral tend to add on full text.
¶ 63 Leave a comment on paragraph 63 0 A catalog resource focusing on the Chinese canon was created by the late Aming Tu. The Digital Database of Buddhist Tripitaka Catalogs is still the best way to check online which of the many Chinese canonical editions contain a given sutra.
¶ 64 Leave a comment on paragraph 64 0 Michael Radich and Jamie Norrish have developed the “Chinese Buddhist Canonical Attributions Database” that synthesizes past research on the vexing problems that surround the translatorship attributions in the Chinese Buddhist catalog tradition.
¶ 65 Leave a comment on paragraph 65 0 Bibliographic control of secondary scholarship is a fundamental part of research. In order to make a contribution we must know whether our questions have been addressed before. The struggle to avoid repetition is after all what separates modern academic from traditional scholastic practice. Thus we consult bibliographies, which in the case of Buddhist Studies are of irritating range. Buddhist Studies began in the 19th century and many early works are still useful, citable, and now, with the Internet Archive, more easily available than ever. Already the earliest monographic bibliography of Buddhist secondary literature (Held 1916) contained c. 2500 items. Academic communication in Buddhist Studies is still conducted internationally in Japanese, Chinese, English, and French. German and Italian on the other hand used to be read internationally, but today – like Korean, Dutch, Russian, Nepali, Vietnamese, Thai, Sinhalese and many others – are hardly ever used beyond their national boundaries. Nevertheless important secondary literature exists in all these languages and should ideally been taken into account.
¶ 66 Leave a comment on paragraph 66 0 Buddhist bibliography in print has been prolific in the 20th century – my Bibliography of Buddhist Studies Bibliographies lists some 148 items. In the digital, two important long-term initiatives, the NTU Digital Library and Museum of Buddhist Studies and the Indian and Buddhist Studies Treatise Database (INBUDS), provide an entry point into the vast amount of secondary literature in Chinese and Japanese. The more recent H-Buddhism Bibliography Project has the potential to become a strong, community-maintained bibliographic database.
¶ 67 Leave a comment on paragraph 67 0 The Digital Library and Museum of Buddhist Studies, hosted at National Taiwan University (NTU), is neither a library nor a museum, but rather the most comprehensive online bibliography of Buddhist Studies. It started as a cooperation between Dharma Drum and the NTU Philosophy Department and is now maintained and developed by the NTU Library. In its early stages an agreement with INBUDS allowed for the inclusion of much of the INBUDS dataset. As of June 2019, according to the project website, the database contains c. 405,000 entries and 60,000 full-text articles. The Digital Library and Museum of Buddhist Studies collects data in any language from all fields of Buddhist Studies, but its coverage of Chinese secondary literature is especially advanced. The interface is adequate, but a download of the dataset is not possible. One can, however, obtain one’s query results via email.
¶ 68 Leave a comment on paragraph 68 0 The Indian and Buddhist Studies Treatise Database (INBUDS), was a visionary endeavor when it was conceived in the late 1980s, before the days of the World Wide Web. Going online in 1998, it still maintains the largest bibliography of Japanese secondary literature on Buddhist Studies. Western and Chinese literature on the other hand are not well represented. It seems that in recent years INBUDS has become integrated into the national Japanese CiNii database service, the meta-catalog for Japanese university libraries and academic journals published in Japan. Very laudably and hardly noticed, INBUDS has made a snapshot of its dataset (as of 2015.03) available for download that contains c. 69,000 entries. A lot of interesting things could be done with that (building author-topic networks, identifying research trends over time, incorporating it into the H-Buddhism Bibliography Project etc.).
¶ 69 Leave a comment on paragraph 69 0 The H-Buddhism Bibliography Project was started in 2012 by Charles Muller, the creator of the Digital Dictionary of Buddhism (see Sec 2.1), who is also responsible for more than 90% of the entries. It is an Zotero based attempt that aims to pool the many private bibliographies that users of the H-Buddhism mailing list have developed, but contribution from the community has been relatively low, as few researchers keep their bibliographies in structured formats (BiBTeX, EndNote, RIS etc.). Currently, there are c. 9300 items on file. This is less than either the NTU Digital Library and Museum or INBUDS, but for DH purposes the difference is that the records are available in an easily computable format and via the Zotero platform can be easily converted to other formats e.g. for use in LaTex, or research in trend analysis.
¶ 70 Leave a comment on paragraph 70 0 Apart from these general bibliographies for Buddhist Studies a few specialized digital bibliographies created by individual researchers are openly available. Part of the “Epistemology and Argumentation in South Asia and Tibet” (EAST) project led by Birgit Kellner is a meticulous, multi-lingual bibliography on scholarship about late Buddhist Indian texts. There is also Dan Martin’s TibSkrit, these days available on Dropbox. My own Bibliography of Translations from the Chinese Buddhist Canon first went online in 2001 and collects to date c. 1200 translations of c. 550 texts into “Western” languages.
3. Going forward: New paths for research
¶ 71 Leave a comment on paragraph 71 0 Detractors at times accuse the Digital Humanities of failing to make good on their promise to deliver new results, and indeed it seemed at times as if the growing amount of data was inverse to the amount of new approaches or research questions that are asked of it. This criticism, however, overlooks two things.
¶ 72 Leave a comment on paragraph 72 0 First, we simply are still very much at the beginning – lasting developments in academic methodology often span generations, and the juggernaut of digitization and related technologies of the last 20 years has been nothing like the relatively stable phase in academic communication that preceded it – the age of print. It is quite possible that we will not see a defined and stable set of research methods for some time to come. Like the editions of the Buddhist canon, such as the PTS or the Taishō, that became authoritative in the 20th century, have dissolved into a moving array of digital corpora, our methods to tackle those texts too might stay very much in flux for the foreseeable future.
¶ 73 Leave a comment on paragraph 73 0 Second, just like the technologies of digitization, methodological developments in the Digital Humanities have originated elsewhere. The conceptual tools to deal with digital cultural heritage data were almost never invented by Humanists, but adopted and adapted from existing methods in data science. There is no reason why Humanists should not make occasional use of Principal Component Analysis or a chi-squared test, considering such methods are quite common elsewhere in academia. Still, there is an extra effort involved in the adoption of these methods. In many neighboring fields (e.g. geography, archeology, sociology, computational linguistics), as well as in more distant departments (e.g. the life sciences), researchers have long been adapting statistical and computational methods to meet their own needs – albeit with less resistance from their peers. One problem is that training in even basic statistical and computational methods is almost never part of graduate programs in the Humanities, and students have to rely on their own initiative to acquire the necessary skills. Thus, although digitization has certainly changed the way Humanists query their data, and has made a vast amount of hitherto neglected works available, the application of computational methods to analyze the data has been lagging. Larger fields such as literature, medieval studies, and classics, which are a few years ahead of Buddhist Studies in the creation of their corpora, have in recent years seen a steady stream of DH informed analysis.
¶ 74 Leave a comment on paragraph 74 0 However, even in the field of Buddhist Studies a few initiatives have begun to use computational methods. In the field of East Asian Buddhism, for instance, one of the more pressing research questions is the translatorship of pre-Tang dynasty texts in the canon, many of which are unattributed or were wrongly attributed by traditional catalogs. We are still in the early stages of exploring how best to assess and compare Buddhist sutras computationally, but first results are quite encouraging.
¶ 75 Leave a comment on paragraph 75 0 Michael Radich and Jamie Norrish have developed a set of programs called TACL (“Textual Analaysis for Corpus Linguistics”), for the large-scale analysis of digitized texts in the Chinese Buddhist canon and similar corpora. The core functionality is simple—TACL compares two or more user-defined texts or corpora of any size, to find either (a) strings common to all sides of the comparison (intersect), or (b) strings unique to one side (difference). At present, TACL operates only on literal contiguous strings, i.e. it cannot handle fuzzy matching or patterns that bracket intervening text. Further functionality includes filtering and sorting of the results; concatenation of multiple tests (feeding of results from one test into a new test) etc.
¶ 76 Leave a comment on paragraph 76 0 TACL has significant power to help scholars locate evidence bearing on problems of ascription, dating, textual history, earlier sources, later reception and impact, textual circulation, and other aspects of intertextual relations. In a series of publications, Radich has piloted application of the tool to a range of typical research problems.
¶ 77 Leave a comment on paragraph 77 0 Jen-jou Hung and myself have been interested in similar issues and are working towards improving algorithms that can show textual anomalies and distinctive differences between translation styles. After developing a variable n-gram algorithm for Classical Chinese Texts, we have applied clustering algorithms to identify translatorship and translation date. The Digital Archive of Buddhist Gazetteers can be used as a benchmark corpus for name and entity recognition as well as allowing for diachronic analysis of how certain places were perceived or the association of individuals with places. Parallel to this we have built various datasets based on markup that contain facets which can be utilized in two other prominent fields of DH application: geographic and social network analysis (SNA).
¶ 78 Leave a comment on paragraph 78 0 Both geographic and social network analysis used to be the preserve of specialists, who had to invest significant time and effort to master the software and methodology. Moreover, neither GIS nor SNA data relevant to the study of Buddhism was readily available in digital form. This has changed in recent years. Datasets have been created and made openly available, while open source tools such as QGIS and GEPHI have lowered the threshold for applying these perspectives. Applied to the study of Buddhism geographic perspectives can e.g. elucidate regional patterns in the growth or decline of Buddhist institutions, or visualize patterns of pilgrimage travel. Historical network analysis can help us to identify key-players or cliques in past and present Buddhist networks, and answer questions regarding patronage or the flow of information over time through the networks of monastic and lay-practitioners.
¶ 79 Leave a comment on paragraph 79 0 Looking back over what has been accomplished in the creation of digital resources for Buddhist Studies, the successes with regard to the digitization of texts are spectacular. Canonical collections of most Buddhist traditions (we are still missing a few outliers such as Tangut or Khotanese) are now available in digital form. Crucially, these digital repositories now surpass any single canonical print tradition in terms of volume, acquisition cost, searchability, and portability. The latter is rarely mentioned, but deeply effecting many of us, as digitization has freed researchers to work wherever they like. 20 years ago only a few large research libraries held paper copies of all canonical editions. Today we can search and compare sutra passages in Chinese, Sanskrit, and Tibetan in a café overlooking the ocean, in a monastery deep in the Himalayas, or on an airplane in transit between the two. Reliability of transcription, which in the beginning seemed a serious concern, is now hardly ever mentioned, as researchers can complement their full text searches with large facsimile collections in pdf or djvu.
¶ 80 Leave a comment on paragraph 80 0 Digitization, beyond the confines of academic practice, also stands to affect the tradition itself. As primary texts have been made widely available, they have found more readers than ever. It is for future generations of researchers to understand how this opening of the archive will influence Buddhism itself, where traditionally the canon as a whole has remained out of reach for most believers (mostly for practical reasons, but in esoteric Buddhism also on doctrinal grounds). Writing this on the 500th anniversary of the Reformation, for which vernacularization played such a crucial role, one cannot help but wonder what will happen to the teachings of the Buddha when all ancient and modern versions of all texts become instantly accessible.
¶ 81 Leave a comment on paragraph 81 0 One change that is already discernible is that the authoritative canonical editions which for many decades guided scholars into studying certain texts, have been dissolved into corpora and now live on merely as conventions of citation. The canon is dead, long live the corpus.
¶ 82 Leave a comment on paragraph 82 0 While the digitization of primary texts has been overwhelmingly successful, the digitization of images, objects, and spaces has just begun. Apart from the Huntington Archive and Ānandajoti’s Photodharma so far no archive of Buddhist art has succeeded. Many museums today make images of their holdings available, but an archive with faceted search across institutions and geared to Buddhist iconography still needs to be built. Technologies for 3D scanning and printing of objects are perhaps too new and still too much in flux to warrant a major initiative, but they have strong potential both for teaching and for research. The digitization of Buddhist sacred spaces is also still in its early stages, but as the essay of Quintman and Schaeffer in this volume illustrate, can provide researchers and students with plenty of new data. Digitized text might save a trip to the library, but well documented digital spaces can save a trip to India or China.
¶ 83 Leave a comment on paragraph 83 0 Now that much data has moved from print to digital, Humanists can complement their methods with statistical and computational approaches. As our traditional methods continue to work just fine, we can take our time with this (and we do). Indeed only time will tell whether the application of analytic methods that depend on digital data can substantially advance Buddhist Studies. Judging from the success of past digitization initiatives that have profoundly changed the scope of primary data that we now can query and access, there is reason for optimism.
¶ 85 Leave a comment on paragraph 85 0 Bingenheimer, Marcus. 2015: “The Digital Archive of Buddhist Temple Gazetteers and Named Entity Recognition (NER) in Classical Chinese.” Lingua Sinica 1:8 (2015), pp. 1-19.
¶ 86 Leave a comment on paragraph 86 0 Bingenheimer, Marcus. 2016 “‘Knowing the Paths of Pilgrimage’ – The Network of Pilgrimage Routes in 19th century China according to the Canxue zhijin 參學知津.” Review of Religion and Chinese Society Vol. 3-2 (Issue on Geospatial Studies on Chinese Religions): 189-222.
¶ 87 Leave a comment on paragraph 87 0 Bingenheimer, Marcus. 2018. “Who was ‘Central’ for Chinese Buddhist History? – A Social Network Approach.” International Journal of Buddhist Thought and Culture. Vol.28-2 (Dec. 2018): 45-67.
¶ 88 Leave a comment on paragraph 88 0 Bingenheimer, Marcus, Jen-Jou Hung, Simon Wiles, Bo-yong Zhang. 2016. “Modeling East Asian Calendars in an Open Source Authority Database.” International Journal of Humanities and Arts Computing Vol. 10-2, pp. 127-144.
¶ 89 Leave a comment on paragraph 89 0 Bingenheimer, Marcus, Jen-Jou Hung, Cheng-en Hsieh. 2017. “Stylometric Analysis of Chinese Buddhist texts – Do different Chinese translations of the Gaṇḍavyūha reflect stylistic features that are typical for their age?” Journal of the Japanese Association for Digital Humanities Vol. 2 (2017): 1-30.
¶ 90 Leave a comment on paragraph 90 0 Chen, Chin-chih. 2004. Fan fan-yü: ein Sanskrit-chinesisches Wörterbuch aus dem Taishō-Tripiṭaka. Unpublished PhD Diss (Rheinische Friedrich-Wilhelms-Universität Bonn).
¶ 93 Leave a comment on paragraph 93 0 Hung, Jen-jou, Marcus Bingenheimer, Simon Wiles. 2010. “Quantitative Evidence for a Hypothesis regarding the Attribution of early Buddhist Translations.” Literary and Linguistic Computing (2010) 25(1): 119-134.
¶ 94 Leave a comment on paragraph 94 0 Hung, Jen-Jou 洪振洲, Marcus Bingenheimer 馬德偉, Zhi-Wei Xu 許智偉. 2010. “漢文佛典的語意標記與應用：《高僧傳》文獻的時空資訊視覺化和語意搜尋 – Semantic markup for Chinese Buddhist texts and its Application- A Platform for Querying and Spatio-temporal Visualization of the Biographies of Eminent Monks”, National Chengchi University Journal of Librarianship and Information Studies 國立政治大學圖書與資訊學刊. Vol.2, No.3 (No.74) (Aug 2010): 1-24.
¶ 96 Leave a comment on paragraph 96 0 Muller, Charles A. forthcoming 2018. “The Digital Dictionary of Buddhism and CJKV-English Dictionary: A Brief History.” In D. Veidlinger (ed.) Introductions to Digital Humanities: Buddhism New York & Berlin: DeGruyter.
¶ 97 Leave a comment on paragraph 97 0 Norman, Kenneth Roy. 1983. Pāli Literature – including the canonical literature in Prakrit and Sanskrit of all Hīnayāna schools of Buddhism. Wiesbaden: Harrassowitz.
¶ 99 Leave a comment on paragraph 99 0 Radich, Michael. 2015. “Tibetan Evidence for the Sources of Chapters of the Synoptic Suvarṇaprabhāsottama-sūtra T664 Ascribed to Paramārtha”. Buddhist Studies Review 32, no. 2 (2015): 245-270.
¶ 101 Leave a comment on paragraph 101 0 Radich, Michael and Anālayo Bhikkhu. 2017. “Were the Ekottarika-āgama 增壹阿含經 T 125 and the Madhyama-āgama 中阿含經 T 26 Translated by the Same Person? An Assessment on the Basis of Translation Style.” In Research on the Madhyama-āgama, edited by Dhammadinnā, 209-237. Taipei: Dharma Drum Publishing Corporation.
¶ 103 Leave a comment on paragraph 103 0 Slingerland, Edward, Ryan Nichols, Kristoffer Neilbo, Carson Logan. 2017. “The Distant Reading of Religious Texts: A “Big Data” Approach to Mind-Body Concepts in Early China.” Journal of the American Academy of Religion, 2017: 1-32.
¶ 105 Leave a comment on paragraph 105 0 Wittern, Christian. 2016. センター研究年報2015 CIEAS Research Report 2015 – 特集漢籍リポジトリ Special Issue: Kanseki Repository. Kyoto: Center for Informatics in East Asian Studies, Institute for Research in the Humanities, Kyoto University.
¶ 106 Leave a comment on paragraph 106 0 Wynne, Alexander. 2013. “A Preliminary Report on the Critical Edition of the Pāli Canon being prepared at Wat Phra Dhammakāya.” Thai International Journal of Buddhist Studies Vol.4 (2013): 135-170.
¶ 107 Leave a comment on paragraph 107 0 (*)I am grateful for responses by: Venerable Ānandajoti, Rupert Gethin, Paul Hackett, Susan Huntington, Bryan Levman, Charles Muller, Michael Radich, Miroj Shakya, Sam Van Schaik, Venerable Sujato, Venerable Upatissa, Jeff Wallman, and Christian Wittern. Any mistakes in the information about the resources and all opinions expressed about them are my own. The discussion below is meant to be comprehensive, but it is not complete, and I apologize to all deserving projects that I fail to mention. I have chosen to assess only projects of which at least some data is made freely available.
¶ 110 Leave a comment on paragraph 110 0 E.g. the World Tipiṭaka Edition, produced in Thailand, that aims to improve Burmese Chaṭṭha Saṅgāyana edition. Another large, on-going edition project to watch is the Dhammachai Tipiṭaka Project (2010-) in Thailand. This also aims at a digital edition (Wynne 2013, Levman Forthcoming), but as of today, digital text does not yet seem available.
¶ 111 Leave a comment on paragraph 111 0 At: https://tipitaka.org. All URLs given here and below were accessed Jun 2019 where not otherwise indicated. The VRI edition was also published in a Devanagari print edition.
¶ 112 Leave a comment on paragraph 112 0 https://pali.sirimangalo.org and https://github.com/yuttadhammo/digitalpalireader (the repository also contains XML versions of the Thai and Burmese editions of the Pāli Canon). Originally developed as Firefox extension, the tool works now only with Firefox derivative browsers such as Pale Moon.
¶ 113 Leave a comment on paragraph 113 0 The point is probably moot as most of the texts are in the public domain right now and the digital text does not add content that can be copyrighted. To me, at least, it seems both the assertion of copyright and the CC license is void, but, as with all legal matters, this is decided by courts not scholars.
¶ 114 Leave a comment on paragraph 114 0 Nozawa (2003) lists more than 1200 works related to the topic of the “canon” (Daizōkyō 大藏經) published in Japan between 1879 and 2003. In English see Wu & Chia (2016).
¶ 118 Leave a comment on paragraph 118 0 At: https://github.com/cbeta-git/xml-p5a. For textual analysis of the Taishō texts only this version is preferable: https://github.com/cltk/chinese_text_cbeta_taf_xml.
¶ 120 Leave a comment on paragraph 120 0 This collection contains 493 texts (Main Collection 306 texts + Supplement 187 texts) from the Japanese Pure Land school. Links over the texts link to the online texts at http://jodoshuzensho.jp/jozensearch, where the collection is maintained. Parts of this collection overlaps with the Taishō.
¶ 122 Leave a comment on paragraph 122 0 The latter practice carries the danger that users may be unaware that they are looking at an emendation. This is mainly an interface issue, as the emendations are transparently marked in the digital master text (XML/TEI) and the original wording is always preserved.
¶ 124 Leave a comment on paragraph 124 0 At: http://kb.sutra.re.kr. For search: http://kb.sutra.re.kr/ritk_eng/search/searchBranch.do. I was not able to search the database with current versions of Firefox, Opera, or Chrome.
¶ 134 Leave a comment on paragraph 134 0  At: http://idp.bl.uk. Single collections can sometimes be accessed through other portals as well. E.g. the Pelliot collection has been made available through Gallica (gallica.org), the Berlin Turfan Collection can be found at the Digitales Turfan Archiv (http://turfan.bbaw.de/dta/).
¶ 138 Leave a comment on paragraph 138 0 Bingenheimer, Hung, Wiles (2011). A follow-up project added figures from the Song, Yuan, Ming and Qing dynasties. The current size of the network is at c. 16000 actors (data available at: http://mbingenheimer.net/tools/socnet/).
¶ 141 Leave a comment on paragraph 141 0 The Place Name Authority contains a large number of entries originally created by the GIS team at Academia Sinica. This part of the data is not included in the downloadable archives.
¶ 142 Leave a comment on paragraph 142 0 At: https://ancient-buddhist-texts.net. Ānandajoti has also assembled a rich collection of over 13,000 photos of Buddhist art and sites at: https://www.photodharma.net.
¶ 143 Leave a comment on paragraph 143 0 In the field of Buddhist Studies, we were e.g. lucky that the translations from the Pāli Canon collected on the Access to Insight webspace have been incorporated into the Suttacentral corpus (s.b.). Access to Insight (https://accesstoinsight.org) was started 1993 by John Bullitt, but, although still online, the website is not longer maintained and since 2013 users are advised to download the Legacy Edition (https://accesstoinsight.org/tech/download/bulk.html). Suttacentral has also incorporated the comprehensive site of German translations of Pāli texts, which was created by Wolfgang Greger (http://palikanon.de) and first went online in 1998.
¶ 145 Leave a comment on paragraph 145 0 The project aims to collect all texts from the Siku quanshu 四庫全書, Sibu congkan 四部叢刊, Zhengtong Daozang 正統道藏, Daozang jiyao 道藏輯要 and the CBETA corpus 電子佛典集成. A few texts are still missing as of Sep. 2017.
¶ 150 Leave a comment on paragraph 150 0 Norman dates the earliest extant work of Pāli lexicography, the Abhidhānappadīpika, to the late 12th century (Norman 1983: 166). For Buddhist Chinese, where the prolific translation and transliteration of Indian terms made glossaries indispensable, the earliest glossary is the Fanfanyu 翻梵語 (T.2130), at least parts of which can be dated to the 6th century (Chen 2004). For Tibetan – and later Mongolian and Manchu – Buddhist texts the most influential glossary that unified translation practice was the Mahāvyutpatti, compiled in the 8th to 9th century.
¶ 153 Leave a comment on paragraph 153 0 The latter is based on the important “all_index.xml” data that is a thorough index of many East Asian dictionaries of Buddhism. Started by Urs App and Christian Wittern, it is now maintained by Charles Muller who makes the 2010 version available here: http://www.buddhism-dict.net/ddb/allindex-intro.html. The all_index is an important tool for computational linguistics on East Asian Buddhist texts. It contains the largest number of indexed terms (c.290,000) of all tools. There is also a stardict version for it.
¶ 154 Leave a comment on paragraph 154 0 At: http://glossaries.dila.edu.tw/. Among the available glossaries are: A Chinese Translation of A.P. Buddhadatta’s Concise Pali-English Dictionary, Jeffrey Hopkins’ Tibetan-Sanskrit-English Dictionary, the Mahāvyutpatti in Sanskrit, Tibetan and Chinese, a Pentaglot Dictionary of Buddhist Terms (only Sanskrit, Manchu, Mongolian and Chinese), Digital Index of Noun & Verb Ending from Rod Bucknell’s Sanskrit Manual, Soothill-Hodous’ A Dictionary of Chinese Buddhist Terms. Produced painstakingly from the original files of the author, Dharma Drum has collaborated with Seishi Karashima to produce distributable versions of his fine glossaries. Karashima created glossaries for Dharmarakṣa’s and Kumārajīva’s translations of the Lotus Sūtra, Lokakṣema’s Translation of the Aṣṭasāhasrikā Prajñāpāramitā, and others.
¶ 161 Leave a comment on paragraph 161 0 These figures are somewhat inflated. Even cursory use reveals numerous redundant records, i.e. different entries that reference the same item. Also, not all of the full-text material is accessible.
¶ 164 Leave a comment on paragraph 164 0 At: https://www.dropbox.com/s/zwsf1bv6upp376d/Tibskrit%202014.doc. Other versions can be found on the web, including a stardict glossary from the 2008 version.
¶ 166 Leave a comment on paragraph 166 0  See for example the penetrating critical review of computational literary studies (“what is robust is obvious (in the empirical sense) and what is not obvious is not robust”) by Da (2019).
¶ 167 Leave a comment on paragraph 167 0 See for example the “pamphlets” series at the Stanford Literary Lab (https://litlab.stanford.edu/pamphlets), or the recent supplement “The Digital Middle Ages” to Speculum Vol.92 (2017) (S1) edited by D. Birnbaum, S. Bonde, and M. Kestemont. For recent applications of distant reading methods to Chinese religious texts see Slingerland et al. (2017).
¶ 172 Leave a comment on paragraph 172 0  Datasets for the application of historical GIS to the study of Buddhism can be found at: http://mbingenheimer.net/tools/histgis/. SNA data is available here: http://mbingenheimer.net/tools/socnet/.
¶ 173 Leave a comment on paragraph 173 0 For the application of a GIS informed perspective on Buddhist history see e.g. Hung, Bingenheimer, Xu (2010), Bingenheimer (2016) and other articles in the special issue of geo-spatial studies of the Review of Religion and Chinese Society Vol. 3-2 (2016) (edited by J. Pettit & J. Protass), or the contribution of Pettit, Yang and Huang to this volume.