Skip to content

About

Read more about the conception of and context for CCP-AHC below.

Background

Between 1996 and 2008, the AHRB- and Jisc-funded Arts and Humanities Data Service (AHDS) supported the development and dissemination of digital arts, humanities, and culture (AH&C) scholarship. No single infrastructure has grown up in its absence; instead, computationally intensive research by AH&C researchers has been conducted in relative isolation. Although the promises of “big data,” e-Science, and grid computing have been recognised in turn by this community, only relatively few researchers have successfully redeemed them in shared-compute infrastructure in the last ten years (Bower et al. 2014Terras et al. 2018Ruan et al. 2018). The community now sees the value in the application of machine learning (ML) and AI. But the picture around HPC usage to support the computationally and energy-efficient large-scale use of these techniques is incomplete. Previous efforts to gather community requirements targeting AHRC-facing researchers provide some insights into HPC and cloud computing usage, where 38% of survey respondents used or were moving to “HPC/Cloud” in 2021 (Sufi et al. 2023). The culture within the community around research outputs, skills, and institutional support were viewed as other barriers to adoption (Barker et al. 2024).

Notwithstanding this, flagship UKRI AHRC DRI investments are increasingly interested in leveraging ML and AI methods on the data they produce or steward. Widening access to DRI is a funder priority (e.g. AHRC Strategic Delivery Plan 2022-2025) and recent North American delivery models provide inspiration for how this can be achieved (Dombrowski et al. 2024). The trend in other disciplines toward exposing workflows via well-maintained, usable, interactive applications (e.g. GalaxySBGrid) is also welcomed. However, more work is required to ensure that codes/toolchains and infrastructure are accessible and usable (Zundert 2012). Except for a few mature projects (e.g. the GATE NLP solution), AH&C-domain-specific codes/toolchains have a short lifespan, do not easily extend beyond the research context that produces them, tend to be written by autodidacts, and exceptionally make use of shared-compute infrastructures. They may be developed internationally and may either originate or have strong dependencies on codes and libraries developed in the commercial sector, posing further challenges to their scalability and sustainability. Sustainability also depends on a highly skilled technical workforce. This demand will grow as more research projects engage with the development of RTP career paths (e.g. Digital Humanities & Research Software Engineering Summer School).

Fortunately, sustainability and openness are generally welcomed in the community we target: finding ways to incentivise and implement permissive licensing, FAIR infrastructure, and other Open Science/Research principles are important to AH&C researchers (Arthur and Hearn 2021). Examples of these are the availability of AI-centric software outputs using open-source licences (e.g. AH/Y00745X/1), and novice-friendly, peer-reviewed tutorial publications (e.g. The Programming Historian) that support AH&C researchers to learn digital tools and techniques. In addition, European infrastructures (e.g. the CLARIN and DARIAH ERICs) and UK DRI actions, such as the scoping projects for future data services and repositories (e.g. AH/W007541/1AH/W007592/1AH/W007533/1), have increased the discoverability of codes, data infrastructures, and workflows---and the training required to make use of them.

CCP-AHC will build on some of these outcomes and networks to widen participation in HPC, and - against the background of the convergence between AI and HPC - ultimately increase the uptake of large-scale AI by AH&C communities. We look to efforts from computationally intensive fields beyond the physical sciences, such as bioinformatics, which continue to support skills development (Zhan et al. 2019) and HPC use (Castrignanò et al. 2020). The US Department of Energy have shown how HPC skills infrastructure can tie workforce development initiatives to broader concerns, including widening participation in science, inclusive instructional design, and environmental sustainability (McInnes et al. 2023). Relatedly, the concurrent funding of the Software Sustainability Institute (SSI) led by AHRC (until 2028) and the BRAID programme (2022-2028) represents a compelling development: the flow of expertise from AHRC-facing researchers into computational sciences in the UK.