The Role of Data Warehouses in Higher Education Technology Strategies

Cloud for Good data warehouse strategies for higher education institutions

In a previous blog post written by Kestryl Lowrey (Managing Director, Technology Services) and Stephen Earheart (Director, Campaign Services), Cloud for Good explored the role of data warehouses in nonprofit technology strategies.  Through this exploration, we determined that incorporating a data warehouse into a nonprofit’s overall technical landscape and data strategy could help support a lean, performant CRM to help propel staff forward without stashing legacy data in an inaccessible local archive.  Having a warehouse become a record of historical data and a tool for analyzing outcomes and trends is incredibly beneficial to a nonprofit organization.  But what about higher education institutions? 

Institutions utilizing Salesforce across campus are familiar with achieving a 360-degree view of constituents and establishing a single source of truth.  What they might not be familiar with, however, is how a data warehouse or a data lake can be incorporated into their technology strategy and how it can help leverage large data sets to empower users and drive data-backed decision making.   

Higher Education Data Warehousing Use Cases

The role of a data warehouse is to serve as a collection point for an organization’s data.  This collection point may serve as a system of record, a source for analytics strategy, a repository of otherwise irreconcilable data structures, as a historical archive, or innumerable other business functions.  A warehouse can support and enhance the functionality of a CRM and should be considered for all organizations seeking to develop mature data strategies.   

This is particularly true for institutions relying on Salesforce as their sole data repository long term.  Salesforce is not ideal for long-term management of high-volume data, but Salesforce’s limitations can be solved through a combination of archival and virtualization strategies, shipping of aggregation between systems, and well-developed integration strategies exchanging information in and out of a warehouse and/or data lake.  

Data Warehousing for Recruitment & Admissions

Higher education institutions have an inherently high volume of data generated through the recruiting and admissions processes.  Touch points are established with prospective students and their interactions with the institution across several mediums and communication methods (mail, phone, email, etc.).  For the prospective students that never end up attending the institution, all the associated data with the unconverted prospective student builds up over time.   

This data is likely not of relevant use to the daily CRM users, nor is it likely of relevant use to the institution in the long term but might be required to be retained due to data retention policies.  The best strategy, in this case, would be to house that data externally through a data warehouse.  Should the one-time prospective student resurface down the road, the associated data too can be resurfaced at the record level detail. 

Data Warehousing for Events and Advancement

Similarly, an influx of registration information for the many events higher education institutions hold is another common candidate for housing in a data warehouse.  This high volume of data, again, might not be valuable to retain at the record level detail.  The data of someone who attended a fundraising dinner event years ago might be good to measure in relation to the interactions that were made or to chart attendance over time, but the institution may not necessarily need the actual individual record data.   

This line of thinking also applies to direct mail campaigns and large email blasts.  The metrics and aggregate details pertaining to the success of these campaigns are valuable, but the record level detail can most likely be kept externally in either a lake or a warehouse.  Generally, the information is valuable in the aggregate but not at the individual detail level. 

On the other hand, there are other high-volume data generators, like student registration, that tend to exist within an institution’s CRM for a longer time because of the need for reportability and ease of access.  If someone attended actual classes and engaged with the institution over several semesters, that student’s data is likely required to be retained more often than for someone who attended an event, so the student’s information would be best kept in the CRM long term. 

The student information system (SIS) often serves as the ultimate system of truth for registration and course engagement data. Some of this data may be replicated across SFDC via integration solutions. However, it may not make sense to build supporting structures for all SIS data in Salesforce. A warehouse can provide flexibility in aligning either otherwise irreconcilable data or data which does not make sense to have built out fully in multiple systems and supported via integration.  

How Data Warehouses and Data Lakes Differ

It’s important to note the difference between a data warehouse and a data lake: two separate solutions that serve similar, yet differing, functions.  Generally, data lakes are developed to serve as a universal collection point of unstructured (or less structured) data, with a focus placed on ease of connectivity/data collection.  Data lakes have less importance placed on structure, formatting, and data integrity.  Consequently, this often means that it is more difficult for business users to interact with and extract value directly from data lakes.  Business users typically need to work closely with data scientists to transform the raw data in a data lake into meaningful structures.  Data scientists may leverage BI tools such as Tableau in tandem with a data lake to provide insight to the business users.  

Conversely, a data warehouse tends to have a more rigid structure, a more robustly defined schema, and clearly defined rules about what data can and cannot go into the warehouse.  Generally, this translates to more effort in writing data structures to a warehouse compared to a lake.  However, this also means that the data is typically easier for business users to extract value from and interact with data housed in a warehouse.  Many institutions might have both a data warehouse and a data lake depending on respective data strategies and accompanying technology stacks. Due to the more rigid structure of a warehouse, business users may employ their own BI tools such as Tableau against a warehouse to provide insight, with or without the support of data scientists.   

There’s never been a better time to utilize data warehousing with Salesforce, given that Salesforce.org and Amazon Web Services (AWS) have partnered to develop external data strategies, including data warehouses to better manage data and accelerate the impact of both higher education institutions and nonprofit organizations.  Salesforce is currently developing high connectivity between the two platforms with a number of intuitive connectors currently in development.  Salesforce recognizes organizationally that not everything should live in Salesforce proper, so time and resources are being routed towards development making data warehousing easier and more efficient for institutions of all sizes. AWS is not the only option out there; Snowflake’s data warehouse offerings can provide enterprise-level solutions with their own unique advantages. 

Better Data Management Through Salesforce

Data warehouses do not exist in silos; they exist in tandem with an institution’s retention, governance, and integration strategies.  A data warehouse often exists to support many business functions within an organization, some of those functions are to support data lifecycles and retention policies.  A focal point for warehousing is often an archival strategy or the intentional removal of record-level detail from CRM to an external system.  So, with that in mind, what information should be archived in a warehouse?   

A good starting point when establishing data retention policies and data lifecycle strategies should be surrounding any data not directly actioned upon by everyday Salesforce users but is of value in a greater context for background information, to support decisions, and to provide direction, should be considered for archival.  Individual giving history, summarized digital information, advancement details/summaries, and large volume direct mail details/summaries, too, are common candidates for archival/data warehousing.  Your CRM users should be interacting with relevant, actionable data.  “Stale” data which does not provide value may serve as a hindrance to efficiency and should be considered for archival in a warehouse. 

For information on the current and future states of data warehousing on Salesforce, contact Cloud for Good at info@cloud4good.com today. 

You May Also Enjoy: