Colectica Awarded an NIH Grant for variable harmonization and concordance

Colectica, a Minneapolis-based leader in standards based data documentation, has received a Phase I Small Business Innovation Research grant from the National Institute on Aging (National Institutes of Health).

The goal of the grant is to develop human-in-the-loop algorithms to operate as a "recommendation engine" to guide the concordance of potentially equivalent or similar variables among multiple datasets.

"The current research data environment provides many opportunities for linking similar topical datasets and harmonizing extant common variables, but few software tools are available to facilitate this resource-intensive task.," said Dan Smith, the project PI and a co-founder at Colectica. "This project will use an open-standards framework to assemble richly-described datasets that are mapped against the NIH Common Data Elements (CDE) library to identify equivalent concepts and variables. Machine learning will guide data managers through the process and produce variable crosswalks that will aid harmonization and discoverability both within and across studies and datasets."

Machine learning will guide data managers through the process and produce variable crosswalks that will aid harmonization and discoverability both within and across studies and datasets

Dan Smith, PI and Co-founder at Colectica

The software being developed will use machine learning and advanced text analysis algorithms to guide the creation of concorded databases (variable crosswalks) that support harmonization and discoverability, both within and across aging-related statistical datasets. Additionally, the prototype will use an open-standards metadata framework to produce richly-described concordance databases that are interoperable, citable and FAIR.

Colectica has a track record of creating open-standards based software tools that reduce data management burden by automatically extracting structured metadata from macro-level (study) and micro-level (variable) characteristics of aging studies. All of the tools developed during this project will use open Standards to allow for interoperability with other tools. Colectica products, training on open Standards, documentation, and customized software solutions are available through .

About Colectica

Launched in 2010, Colectica® is the fastest way to design, document, and publish statistical research using Open Data standards. The Colectica Platform is an ideal solution for statistical agencies, survey research groups, public opinion research, data archivists, and other data centric collection operations that are looking to increase the expressiveness and longevity of the data collected through standards based metadata documentation. The company offers a range of highly specific products and services designed to give power to people through easy integration and access to data.

About the National Institute on Aging (NIA)

NIA leads the U.S. federal government effort to conduct and support research on aging and the health and well-being of older people. Learn more about age-related cognitive change and neurodegenerative diseases via NIA’s Alzheimer's and related Dementias Education and Referral (ADEAR) Center website. Visit the main NIA website for information about a range of aging topics, in English and Spanish, and stay connected.

About the National Institutes of Health (NIH)

NIH, the nation’s medical research agency, includes 27 Institutes and Centers and is a component of the U.S. Department of Health and Human Services. NIH is the primary federal agency conducting and supporting basic, clinical, and translational medical research, and is investigating the causes, treatments, and cures for both common and rare diseases. For more information about NIH and its programs, visit


Colectica is a registered trademark of Colectica and/or its affiliates. Other names may be trademarks of their respective owners.

Disclaimer: Research reported in this press release is supported by the National Institute on Aging of the National Institutes of Health under the award number 1R43AG085861-01. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.