Big data in Archaeological Science – Data Management and Data evaluation
Primary scope of the data management activities of the CCM Group is to follow the FAIR data principles (Findable, Accessible, Interoperable, Reusable, https://www.go-fair.org/fair-principles/). Particularly in the case of Cultural Heritage items or structures it has to be kept in mind that recurring sampling or reiterating analyses are commonly not possible. For this, once an analytical investigation is initiated this should be considered as unique opportunity and the collected data and metadata have to be precisely documented and sustainably stored. Furthermore, the access to the collected data by authorized scholars should be arranged either through external data repositories or through internal data bases. The ceraDAT database, which is developed and administrated by the CCM Group, is the prototype of a relational database for archaeological ceramics. At this stage the ceraDAT database contains the elemental compositions of more than 11.000 ceramic artifacts from the Eastern Mediterranean Region analysed by neutron activation analysis (NAA) in different laboratories. The data have been calibrated on the basis of the analytical routine and can be accessed via a web application. Apart from the elemental compositions, the database contains reference patterns of specific sites, metadata concerning archaeological information, geographical distribution, literature references and former statistical evaluations. Due to the database design, the structure can be extended in future towards petrographical and mineralogical information.
ceraDAT database (ceradat.net)
Furthermore, the collaboration and networking with existing or future data repositories is investigated and extended. The ultimate scope is to combine virtually databases, which comprise different categories of data, materials and material origin.
Apart from data management the deployment of innovative approaches for data evaluation and categorization is assessed and tested. For this conventional multivariate statistics is combined with data mining, pattern recognition and machine learning technologies. A pilot study using Artificial Neural Networks (ANN) and Self Organizing Maps (SOM) in the evaluation of compositional data provides promising results. In this specific cast study transport jars from a Hellenistic agora in Cyprus have been analyzed with portable XRF (pXRF). The pXRF data have been evaluated initially using hierarchical clustering. As an alternative unsupervised pattern recognition and the generation of SOMs has been tested varying parameters, such as element suite, grid size, learning rate and radius of the neighborhood. The SOM eventually provide a largely comparable clustering.
SOM of the pXRF data set and heat map comparing initial hierarchical clustering with SOM clusters
Publications
Hein, A., Kilikoglou, V., ceraDAT – Prototype of a web based relational database for archaeological ceramics, Archaeometry 54, 2 (2012) 230-243. DOI: 10.1111/j.1475-4754.2011.00618.x
Hein, A., Kilikoglou, V., Compositional variability of archaeological ceramics in the Eastern Mediterranenan and implications for the design of provenance studies, Journal of Archaeological Science: Reports 16 (2017) 564-572. DOI: 10.1016/j.jasrep.2017.03.020
Hein, A., Kilikoglou, V., Sustainable data management in the study of ancient materials – using the example of archaeological ceramics, in A. Sarris (ed.) Best Practices of Geoinformatic Technologies for the Mapping of Archaeolandscapes, Archaeopress (2015) 251-260.
Hein, A., Self organizing maps for evaluation of compositional data of archaeological ceramics measured with portable energy dispersive XRF, in A. Hein, I. Karatasios and V. Kilikoglou (eds.) Proceedings 4th Conference on Computer Applications and Quantitative Methods in Archaeology Greek Chapter (CAA-GR), Athens, forthcoming.