Healthcare and Life Sciences analytics use-cases simplified

Challenges in HCLS Data processing

Four broad segments of Life Sciences (Genomics, Clinical Sciences, Pharmaceutical, Proteomics) generate a lot of data which is difficult to manage. Advanced analytics techniques are used to improve outcomes in areas like drug discovery, disease understanding, patient engagement, personalized medicine, product design, etc. However, data integration and processing challenges, listed below, inhibit HCLS companies from realizing full value potential of digital technologies.

  • Data sitting in silos at multiple places and public datasets.
  • Structured and Unstructured data.
  • Multiple point product tools for data integration.
  • Cumbersome Data pipelines for ingestion and data transformation.
  • Data Governance and compliance.
  • On premise, computer infrastructure is not scalable and economical.

Amorphic - Cloud Data Analytics platform for HCLS

Amorphic is a production ready data lake as a service platform that provides a self-service tool for data ingestion, preparation, transformation, ML and customized dashboards, and workload prototyping using AWS and 3rd party analytics services and tools. Some salient points of Amorphic solution:

  • Cloud based platform to store and process publicly available HCLS databases and datasets at single place
  • Integration with current AI/ML systems, easy compatibility with third party AI/ML libraries (RNN, GANs, Neural Nets, etc.)
  • Scaling of compute and storage resources with pay as you go consumption models.
  • Low code ETL and AI/ML data pipelines.

Next sections highlight the simplification of a few HCLS use cases with a faster time to insight enabled by Amorphic.

Genomics Workflows

Amorphic platform can provision genomic data pipelines using a single interface. Amorphic jobs can configure workflows for genomic data that can scale and run in parallel for cost efficient data processing. Amorphic datasets can directly connect to sequencing platforms for seamless ingestion to S3, Redshift, and Athena. It can scale genomic pipelines on demand with just a few clicks. Integration with Notebooks and Dashboards can provide customizable collaborative workspaces for Scientists, Computational biologists, and bioinformaticians. Amorphic integrates Genomic workflow tools such as GATK and BLAST provided as job templates for customized genomic data analysis pipelines. Auto scaling and serverless architecture can help save resource and engineering costs.

Clinical Data Analysis

The Amorphic platform makes it easier to work with large clinical datasets with automated ingestion, transformation and analytics that can support a variety of clinical data workloads. Amorphic comes with pre-built use cases for handling electronic health records (EHR), medical images, clinical trials, document analytics, wearables, and medical claims data sets. Ability to automate ingestion, transformation, and management of data makes it easy for healthcare organizations to derive value out of the data using analytics and machine learning. Amorphic handles both structured and unstructured data with integrated profiling and ML services for datasets stored on AWS. Provisioning Analytics dashboards and machine learning models on top of these datasets is seamless with integration with Tableau, Power BI, Spotfire, Sagemaker notebooks, and Quicksight. Serverless architecture can provide on demand scale-up and scale-down of workloads with ability to handle petabyte (population scale) datasets.

Chemical Informatics & Drug Discovery

Amorphic easily scales and speeds up drug discovery and cheminformatics workloads. Amorphic platform for cheminformatics comes pre-configured with public drug and molecule datasets (ChEMBL, pubchem, and Drugbank). We can add more public databases on demand. Drag and drop low code ETL allows easy cleaning, filtering, transformation of molecular datasets and build of knowledge graphs, interactive dashboards, and collaborative notebooks for analytics. Amorphic connections can ingest experimental data through database API, REST API, and file systems. S3 storage supports unstructured data like images, S3 Athena profiling allows working directly on structured chemical formats not supported in traditional databases. Amorphic can build workflows that support topology, information retrieval, and data mining. Integration with ML and AI (third party and Amazon) allows advanced analytics on Amorphic chemical datasets with just a few clicks. Serverless architecture can provide a cost efficient way to scale up and handle large datasets.


The Amorphic platform can easily scale and process large proteomics datasets with easy integration with experimental data, automated data management, drag-and-drop ETL, and provisioning of Analytics workspaces with just a few clicks. Amorphic can provide instant access to HPC resources on AWS to handle structural biology workloads and cost optimize and speed up research and development workloads. Amorphic datasets can support Biomarker and assay datasets using both S3 and Athena to provide for search and query capabilities. Integrating multiple datasets like biomarkers, organisms and tissue can produce information frameworks for easy research and development. Collaborative workspaces allow data scientists, researchers, and bioinformaticians to work on top of datasets and increase throughput and productivity. End to End integration allows complete visibility from raw data to finished datasets and derives more value from proteomics research.