Skip to main content

FAIR collaboration at scale: the BRAIN Initiative Cell Census Network Data Ecosystem

29 November 2022

A new preprint describes the BRAIN Initiative Cell Census Network (BICCN), a major five-year effort within the BRAIN initiative that pioneers collaboration based on open and FAIR research practices. The BICCN is committed to implementing practices and technologies to make data and other research products FAIR. Many of the authors are members of the INCF community, including Maryann Martone (INCF GB Chair), Michael Hawrylycz, Giorgio Ascoli, Jan Bjaalie,  Satrajit Ghosh, Anita Bandrowski, Benjamin Dichter, James Gee, Tom Gillespie, Yaroslav O Halchenko, Dorota Jarecka, Lydia Ng and Maja Puchades. 

The NIH BRAIN Initiative is a large-scale US effort aiming to accelerate neuroscience research. In 2017, the BICCN project was launched under the BRAIN initiative to work on systematic multimodal cell type profiling and characterization of the whole mouse brain. The effort was meant to serve as a proof of concept for doing the same sort of extensive charting but on a much larger scale, in human and non-human primate (NHP) brains.

Today, BICCN is an integrated collaborative network of data generating centers, data archives and data standards developers. A dedicated data center (the BRAIN Cell Data Center, or BCDC) is responsible for managing and integrating data across the ecosystem.

The data workflow within BICCN includes three major components: work in individual centers, ingestion and storage in dedicated archives, and listing in BCDC data catalog and portal.

BICCN data
Each BICCN project has contributed publicly accessible data to a multimodal classification of cell types based on transcriptomic, epigenetic, proteomic, morphological connectivity, anatomic distribution, and physiological signatures of cells. Within BICCN, data has been released openly on a quarterly basis, and all data except sensitive human data, has  been shared under an open (CC-BY-4.0) license. 

Data generated within BICCN are submitted to one of four BRAIN archives depending on data type(s): Neuroscience Multi-Omic Data Archive (NeMO), Brain Imaging Library (BIL), Distributed Archives for Neurophysiology Data Integration (DANDI) for neurophysiology data, and Brain Observatory Storage Service and Database (BossDB) for electron microscopy ultrastructural datasets. Datasets are indexed and referenced by the BCDC which provides a portal for accessing the consortium’s data, tools, and knowledge. In addition to ongoing BICCN datasets produced by individual laboratories and resulting publications, six active BICCN Working Groups are presently engaged in collaborative projects (BICCN 2.0) integrating and interpreting new and existing data.

BICCN data archiving & portal
The BCDC supports logistical organization, data integration and development of common data standards, and serves as a central for sustaining, comparing and reanalyzing data. It provides public access, data organization, tools, and knowledge derived by the network. It supports the acquisition of data by providing data models and frameworks for importing structured data, and establishing community standards for description and management of single cell data modalities. 

BICCN data levels
BICCN data and structured data sets are classified according to Data Levels, running from Level 0 (raw data) to Level 4 (integrated datasets with biologically relevant annotation). When the projects were awarded, each BICCN investigator specified levels of data that their project would generate, and BICCN working groups collectively later reconciled these definitions.

  • Level 0: primary raw data directly from individual laboratories running specific assay platforms
  • Level 1: QC/QA Validated data with appropriate associated metadata. 
  • Level 2: Linked data that is associated with a specific brain region or nuclei,
  • Level 3:  datasets with computed Features (Level 3)
  • Level 4 Integrated datasets with biologically relevant annotation and comparison with other sources 

BICCN data from the mouse brain is mapped to a standard coordinate framework, the Allen Mouse Common Coordinate Framework, which is also used by the European infrastructure EBRAINS. It serves as the main anatomic data browser and spatial coordinate environment for mouse data within BICCN.

Reproducibility within BICCN
The BICCN network aims to facilitate reproducibility in several ways. Multimodal analysis is performed by cross institution analysis working groups. A cloud-based data processing platform provides standardized computational pipelines (that use a consistent standard file schema, standardized quality control metrics and metadata) and an environment for reproducible science across groups. An Infrastructure and Standards Development group develops needed software, formalizes cross modality standards, and specifies data structures and protocols. 

The BICCN has developed many tools and applications to work with BICCN data, including:

  • The Cell Type Knowledge Explorer (RRID:SCR_022793) is an interactive application that aggregates multimodal BICCN data from the primary motor cortex (MOp) atlas at the level of individual cell types in mouse, human and marmoset. A data-driven ontology enables text-based search of the data by linkling to a well-established body of knowledge on neurobiology.
  • The NeMO Analytics (RRID:SCR_018164), a web-based suite of data visualization and analysis tools for single-cell data analysis that allows users to explore single-cell, single-nucleus and spatial transcriptomic and epigenetic profiling data.
  • The Mouse Connectome Project (RRID:SCR_004096), which offers connectivity data for over 10,000 neural pathways in >4000 experimental cases with a combination of injection strategies that can simultaneously reveal key connectivity information for a given brain region and enable construction of detailed connectivity maps of different functional systems in the mouse brain.
  • The Brain Architecture Project (RRID:SCR_004283) with high resolution 2D images from BICCN collaborators. Datasets are served into species and experiment-type specific pages.
  • The Brain Data Standards Ontology, an ontology of cell types defined in the BICCN MOp that extends the Cell Ontology (CL) to provide a more detailed set of terms for FAIR-compliant annotation than previously available.

BICCN Standards, Best Practices and Recommendations include metadata and file formats, common processing pipelines, spatial and semantic standards, and identifier systems. 

BICCN Working Groups focused on harmonizing protocols, data formats and metadata for transcriptomic, physiological, and anatomical data types. BICCN-developed standards are available through a public GitHub Repository:

The BCDC coordinated the formation of working groups of consortium members which considered what standards and best practices were necessary for new experimental technologies for which standards were not yet available, including developing QC criteria for a given modality. The BCDC was also responsible for developing strategies to harmonize common metadata across the archives, including submissions checklists, collections metadata and basic descriptive information for specimens. 

The Metadata and Infrastructure Working Group,  with representatives from the BCDC and the four BRAIN Archives housing BICCN data and BICCN investigators, coordinated the adoption and development of the necessary technical standards to support FAIR data.

Additional standards were adopted over the course of the project as they became available, e.g. the Essential Metadata for 3D Microscopy standard developed with support from the BRAIN Initiative (currently under consideration for endorsement as an INCF standard).

Read more:
BICCN preprint: The BRAIN Initiative Cell Census Network Data Ecosystem: A User’s Guide 
BICCN portal
BICCN Working Groups