How to Address Health Equity Issues in a Proactive Manner

Data sources needed to compute SDoH risk analysis composite scores

By Rahul Sharma

Value-based benefits administration (VBBA) is about leveraging social determinants of health (SDoH) to implement value-based care (VBC). True and successful VBBA operationalizes the VBC continuum of capabilities that exist where VBC and precision health intersect to create value-based plan design.

Leveraging SDoH for VBC requires proper measurement and interpretation of available SDoH data. It is a complex problem due to the variety of structured/semi-structured/unstructured data sets coming from multiple sources, the lack of standardization in data collection and processing, as well as the need to capture a very large number of demographic, environmental, and socioeconomic metrics not yet measured today.

Socioeconomic factors within a given population have a significant influence on the social determinants needs within that population. Decades of research have shown that populations with a lower socioeconomic status tend to have lower life expectancy and higher engagement in unhealthy behaviors like drinking or smoking. This population also often battles with high rates of chronic health issues such as mental illness and obesity, primarily due to problems accessing proper healthcare.

Communication barriers—especially for a population with English as a second language—include issues related to transportation, affordable housing, stable employment, proper nutrition, quality of housing, and proximity to crime and violence. They are among the longer list of environmental factors that impact the overall health of individuals and cause massive drain on the healthcare ecosystem.

The first step in resolving any problem is acknowledging it and identifying the key factors leading to it. The next step is determining how to prospectively identify the population that is experiencing the problem. The outcome of this analysis leads to a set of customized steps that, when taken, address the various issues at a local and national level.

The impact of SDoH and socioeconomic status on the health of the population has long been acknowledged. Key domains have been identified for categorization of SDoH impact areas, with many local and national firms partnering with community-based and social organizations, along with payers and providers, to try to address these issues. However, there is no easy way to create a composite risk score with enough granularity to help with the analysis of the population. The issue is not just the data collection from disparate data silos but also the digitization of the health records, which exist in the form of unstructured data sets. Such data sets also need to be co-related with publicly available data sets (e.g., census data, economic data, etc.) to develop a mechanism to define a composite risk score for individuals and populations, enabling analysis at the ZIP code level as well as rolling it up to state and national levels.

Before we can apply statistical models to the data sets in order to develop composite risk scores for patients, we need to look at the available external and internal data sets (i.e., from both public and private sources) and the formats in which they are available (i.e., to determine if a digitization process step of data engineering is needed first), then co-relate points between the data sets to ascertain the entities, attributes, risk category domains, and measures across which analysis needs to be done. (See Figure 1.)


Figure 1 (Publicly available external data sets [and their SDoH domains])

# Domain Data Sources Key Insights to Uncover
1 Economic & employment Employment (public/private) levels & stability, poverty level mapping and their co-relation to health conditions. Key components: income, wealth, cost of living, poverty, economic development, financial services, and exploitation
2 Education Distribution of education levels and their co-relation to health conditions
3 Environmental Rural vs. urban areas, income and education ranges
4 Housing Homeless population, housing conditions—types of homes, location, access to medical and social care
5 Transportation Transportation availability; traffic patterns; safety from a transportation perspective; crime, public transportation, and economic issues
6 Medical Insured vs. uninsured, claims data, data from charts and notes, chronic conditions, ER visits, demographic data
7 Political ZIP code and census data coupled with political ideology impact on populations

To achieve maximum value and insight, the external data sets should be combined with the following patient-specific data sets:

  • EHR software data—immunity/vaccination records, visit records, charts, and notes
  • Claims/remittances and healthcare utilization data
  • Employer data
  • Eligibility and benefits data
  • Pharmacy data
  • Clinical lab and disease registry data—structured and unstructured clinical data in the form of documents, images, free-form text, and videos
  • IoMT (internet of medical things) data from different devices and apps (e.g., output from remote monitoring devices/personal health trackers, etc.)
  • EVV (electronic visit verification) data sets from homes/facilities
  • Consent management data

By relating these data sets, we can create statistical models with clearly laid-out dimensions and granularity at the local level, from which we can roll up to state and federal levels. The measures (i.e., the variables upon which the statistical models are built) then are used for conducting the principal components analysis and confirmatory factor analysis. Index scores then can be used as a covariate in the multivariable regression stage of the analysis for determining the balance of influence on variation found in different outcomes comparing the SDoH risk score and clinical risk. The outcome of such an analysis allows for not only slicing and dicing the impact on measures across the dimensions, but also deriving a composite risk score for a patient.

The above approaches work very effectively, but a key roadblock in the process needs to be overcome through adoption of a new architecture that takes a practical approach to solving the data sharing and privacy issues. Given today’s constraints, the existing medical data and computation of the SDoH risk score are not fully exploited for two reasons:

  1. More than 70% of that data is unstructured (charts/notes, documents, images, audio/video), requiring multimodel processing with a combination of artificial intelligence (AI)/machine learning (ML) plus amalgamation with the structured and semi-structured data sets
  2. The data sits in silos and is surrounded by privacy concerns that restrict access to it

Issue 1 can be addressed by using natural language processing (NLP) and ML to train and process an entity’s data in conjunction with the data from its enterprise systems. Permissioned data can then be shared with appropriate granularity (full record or only subset attributes) between participants on a permissioned basis via a distributed ledger—with data sharing rules governed by smart contracts.

Issue 2 (keeping the data privacy and security issues in mind) can be solved by replicating only the pertinent data via smart contracts in “ON” ledger data. Each participating entity benefits from such data replication as it can now combine this intel with “OFF” ledger data stores. This approach makes the most sense for the use case of SDoH community networks and patient data.

In addition, the approach to solving for Issue 2 can benefit from a federated learning (FL) approach for ML. FL enables gaining insights collaboratively (e.g., in the form of a consensus model) without moving patient data beyond the firewalls of the institutions in which it resides. Instead, the ML process occurs locally at each participating institution and only model characteristics (e.g., parameters, gradients) are transferred. Instead of gathering data on a single server, the data remains locked on its enterprise infrastructures, and the algorithms and only the predictive models travel between the servers—never the data.

FL works well for mobile and edge device use cases as well as for ML processes for research purposes. The traction with terabytes of transactional data in healthcare applications is not proven yet from a performance perspective in the real world, especially for non-research use cases in healthcare.


A practical deployment model of technology that allows proper use of AI/ML for digitization of data and amalgamation of that data with structured and semi-structured data sets would enhance the computation of a composite risk score for patients. This would help us address health equity issues in a proactive manner rather than remain trapped in a reactive “sick care” model.

Rahul Sharma is the CEO of HSBlox, which enables SDoH risk stratification, care coordination, and permissioned data sharing through its digital health platform.