Four pivotal goals integrate novel aspects of digitization, workflows, outreach, and citizen science.

Goal 1: Establish a novel cryptobiotic consortium: Integrating over six million bryophyte and lichen records.

Combine information on bryophytes, lichens, and their symbiotic organisms, including geographically based links to co-occurring organisms (fungi, algae, invertebrates, microbes), as well as reach out to communities globally and across taxonomic boundaries. This includes the integration of two hitherto disparate portals (CNABH, CNALH) into an integrated data management interface based on the Symbiota software platform. The global integration of bryophyte and lichen occurrence data will enrich studies of ecology and evolution of cryptobiotic communities. Symbiota is web-based and employs a novel data model, information linking, and algorithms to provide highly dynamic customization (Gries et al. 2014). We will add to the current functionality in several ways, including improved handling of specimen images, enhanced georeferencing, as well as other tools outlined below.

Goal 2: Complete the digitization of bryophyte and lichen label data and specimens in 25 US herbaria on a global scale.

We will digitize 1,185,460 bryophyte and lichen herbarium specimens owned by collaborators across the consortium and make the data accessible in order to facilitate a broad spectrum of research, both traditional and cutting-edge. This includes the first-ever plan to image physical specimens of bryophytes and lichens at this scale. Metadata associated with 18,000 endophytes – fungal strains that live inside lichens and bryophytes – will also be added many of which represent new species.

Goal 3: Create a connected world: Innovative automation, integration, image tagging, and machine learning.

Applications utilizing machine learning have achieved stunning levels of performance in computer vision tasks (LeCun et al. 2015). Partnering with computer scientists, we will pilot handwriting recognition systems and automated label transcription using machine learning. Linked data environments and visualized collections data increase capacity for data discovery, yet tracking these diverse datasets across digital repositories can prove challenging (Beaman & Cellinese 2012; James et al. 2018; Soltis et al. 2018). Recognizing the growing demand for linked data and cyber infrastructure, we will develop search and automatic linking of genetic sequence data (e.g., GenBank), add algorithms to investigate co-occurrences of organisms and build AI tools that will tag misidentified specimens.

Goal 4: Focus on public engagement and education.

Digitized natural history collections have become tremendous assets for research in environmental and health sciences, but these data remain largely untapped by educators (Cook et al. 2014). Natural history collections are uniquely poised to broaden access and opportunities for K-12 and undergraduate education, as well as public engagement (Bakker et al. 2019). There is great potential in using crowdsourced science and online technology to unlock data from digital images of specimens (von Konrat et al. 2018). This proposal has specific goals to connect digitization and natural history collections to education and the general public.

Bakker, F.T., A. Antonelli, J. Clarke, J.A. Cook, S.V. Edwards, P.G.P. Ericson, S. Faurby, N. Ferrand, M. Gelang, R.G. Gillespie, M. Irestedt, K. Lundin, E. Larsson, P. Matos-Maraví, J. Müller, T. von Proschwitz, G.K. Roderick, A. Schliep, N. Wahlberg, J. Wiedenhoeft, and M. Källersjö, The Global Museum: Natural history collections and the future of evolutionary biology and public education. PeerJ Preprints, 2019. 7: p. E27666v1.

Beaman, R. and N. Cellinese, Mass digitization of scientific collections: New opportunities to transform the use of biological specimens and underwrite biodiversity science. ZooKeys, 2012. 209: p. 7–17.

Cook, J.A., S.V. Edwards, E. Lacey, R.P. Guralnick, P.S. Soltis, D.E. Soltis, C. Welch, K.C. Bell, K.E. Galbreath, C. Himes, J.M. Allen, T.A. Heath, A.C. Carnaval, K.L. Cooper, M. Liu, J. Hanken, and S.M. Ickert-Bond, Natural history collections as emerging resources for innovative education. BioScience, 64(8): p. 725–734.

Gries, C., E.E. Gilbert, and N.M. Franz, Symbiota – A virtual platform for creating voucher-based biodiversity information communities. Biodiversity Data Journal, 2014. 2: p. e1114.

James, S.A., P.S. Soltis, L. Belbin, A.D. Chapman, G. Nelson, D.L. Paul, and M. Collins, Herbarium data: Global biodiversity and societal botanical needs for novel research. Applications in Plant Sciences, 2018. 6(2): p. E1024.

LeCun, Y., Y. Bengio, and G. Hinton, Deep learning. Nature, 2015. 521: p. 436-444.

Soltis, P.S., G. Nelson, and S.A. James, Green digitization: Online botanical collections data answering real-world questions. Applications in Plant Sciences, 2018. 6(2): p. e1028.

von Konrat, M., T. Campbell, B. Carter, M. Greif, M. Bryson, J. Larraín, L. Trouille, S. Cohen, E. Gaus, A. Qazi, E. Ribbens, T. Livshultz, T. Suwa, T. Peterson, Y. Rodriguez, C. Vaughn, C. Yang, S. Aburahmen, B. Carstensen, and J. Martinec, Using citizen science to bridge taxonomic discovery with education and outreach. Applications in Plant Sciences, 2018. 6(2): p. e1023.