It happens a thousand times a day. Every time someone partakes in medical care, a record of the care is written. Taken together, there are a LOT of medical records that are written each day. And over time, these records form an impressive and important collection of information.
A Wealth of Information
For many reasons, there is a wealth of information locked in those medical records. The information is important to the patient. It is important to the doctor. It is important to clinical care. It is important to research. It is important to insurers. And it is important to the community of medical providers, in general.
As valuable as these medical records are, unlocking the secrets found in these records is challenging. There awaits a series of land mines, ready to be stepped on by those who attempt to unlock those secrets.
Any one human being simply cannot sit down and read these records and make sense of them. There are far too many records to digest. And making sense of them requires a specialized background that few people have. Nevertheless, there is a wealth of information hiding in those records.
So, what are the obstacles to unlocking the secrets found in medical records? There are surprisingly a lot of formidable obstacles that must be overcome by the analyst looking for treasure.
The Diversity Of Sources
The first challenge stems from the fact that there are many sources for these records. Medical records are generated by the many different facets of medical care providers.
Not only are a lot of records generated, but the terminology found in the records can vary widely as well. And then, there is the problem of bringing the records together in a cohesive, intelligible manner, so the sheer number and diversity of the different types of organizations creating the medical records presents a challenge.
The Diversity Of Vendors
Another challenge is that medical records are often created by different vendors. Each vendor has its own language, format, and style. Blending together records from different vendors of medical record technology is not an easy and automated thing to do.
The Volume Of Records
As if there were not enough challenges, another major challenge is that of dealing with the volume of records. Take the number of times you have to blend records together and multiply that by a HUGE number and you have a difficult task. Even the simplest of tasks that has to be repeated millions of times presents a challenge.
Unlocking the secrets in your medical records becomes a monumental challenge.
Good News
But there is good news. All of these obstacles can be overcome, in a reasonable, economically feasible, time sensitive manner. In short, there is a path through this medical record jungle.
The story of finding the path through the jungle begins with the observation that – at the end of the day – all medical records are cast into the form of text. It doesn’t matter how old the records are, or who the vendor is, or what medical discipline the patient has utilized. Ultimately, all medical records are in the form of text.
The path to analysis shows that there are many different sources for text. One of them is for medical records that are on paper and pencil. In order to read and analyze medical records in this form, it is necessary to pass the paper through OCR (optical character recognition technology). Once the text has been passed through OCR, the result is an electronic record. Textual reads the electronic medical record and creates a data base.
The other path to creating the analytical data base record is simply for textual ETL to read the electronic record. Once the electronic record is read, the data base is created.
Note that textual ETL does not care what technology is used to create the electronic record. The record can come from Epic or some other source — textual ETL only cares that the record be in the form of text.
Once the data base is created, Forest Rim supplies the text analytic workbench. The text analytic workbench is used to:
- Select the text that is needed for analysis
- Analyze the text
- Store the results for future analytical activity.
The text analytic workbench operates at electronic speeds and an analysis can be completed in a matter of seconds.
This is the path through the forest. This is how you can start to do analytical processing of medical records in seconds.
Textual ETL is the technology that reads the raw text and edits, converts, and loads the text into a data base. The text analytics workbench is the technology that reads the data base and allows analysis to be done on the text.
Text – The Lowest Common Denominator
Text then becomes the lowest common denominator among all medical records. That’s the good news. The fact that there even is a lowest common denominator is indeed very good news.
But just because there is a lowest common denominator – text – among the different medical records, does not mean that finding commonality and communication among the records is an easy or natural thing to do. In other words, you can’t just strip the medical record of its text and combine it with other medical records. The problem of combining the text of different medical records is a challenge in and of itself.
If you just combine a bunch of text together, you can end up with an indecipherable mess.
Why There is a Problem
So why exactly do you end up with a mess on your hands when you just randomly merge records together? In fact, there are a LOT of reasons for this phenomenon.
Different Words, Same Meaning
One of the many reasons why is the fact that medical terminology is full of terms that have different names, but mean the same thing; you can’t just throw text together and expect to have useful results. As a simple example of this, consider the medication – Lasix. Lasix is also known as furosemide. And Lasix is known as lo-aqua. These are all valid names for the same thing. If you are going to have a cogent analysis, you have to recognize that the same item is being discussed using different names.
Common Formatting
Another mundane issue is that of common formatting of common variables. Take something as simple as the date. In one document, the date is formatted as 07/20/1945. In another, the date is seen as July 20, 1945. Now, these are both logically the same dates, but they have very different physical presentations. When a person reads both documents, they know that these are the same dates. But when a computer reads the same document, the computer must be told that these dates are the same. Then, when you multiply this translation by 10,000,000, the task of equivocating like dates becomes a non-trivial task.
The Same Classification, Different Terminology
A similar issue is that of naming something. In the case of furosemide and Lasix, the reference was made to a specific substance. The same issue arises when we speak of classifications of objects (called “metadata”). As a simple example, there are many kinds of drugs and there are many kinds of medications. But for the purposes of medical care, drugs and medication are the same thing.
In order to have meaningful dialogue between different medical documents, this difference needs to be resolved.
The Same Data, Different Presentation
Yet another important difference is between the simple formatting of variables. In one case, blood pressure is measured as diastolic/systolic. In another case, blood pressure is measured as systolic/diastolic. This is a simple condition to correct. But in order to correct it, the condition has to be recognized. Then it needs to be repeated 10,000,000 times. When you multiply even a simple condition by these numbers, something that is simple becomes something that is complex.
Homographic Resolution
Yet another confusion between trying to make sense of many documents is that of recognizing the meaning of acronyms. In two medical documents, there appears the term “HA”. In one document, “HA” refers to heart attack. In another document, “HA” refers to headache. And in yet another document “HA” refers to hepatitis A. If a proper interpretation is not made, the analysis will assume that a headache is a heart attack, and this surely leads nowhere productive.
Misspellings & Colloquialisms
Another mundane problem is that of the interpretation of misspellings and colloquialisms. For a variety of reasons, proper spelling and the use of common language fosters understanding. While such corrections are usually easy to accommodate, in the face of having to make edits and corrections 10,000,000 times, the edit is no longer trivial.
Common Measurements
Another obstacle to meaningful combination of text is the commonality of basic measurements. Suppose one medical report lists a person’s weight as 130 pounds. Another report lists the person’s weight at 59 kilograms. In order to do incisive analysis, there needs to be one common measurements of a person’s weight. (Logically the weights are the same. But physically they are not. In order to do proper analysis, the weights must be physically resolved.)
None of these issues are not resolvable. But the fact is that all of them need to be resolved, and the resolution must occur over many, many documents means that lumping together the text from medical records is a non-trivial process.
Textual ETL
The good news is that there is a way to accomplish exactly what has been described in an efficient, cost-effective manner. The method is through technology called textual ETL. Textual ETL reads medical records and transforms those records into a standard data base. In creating the data base, terms and measurements are standardized into a common format and meaning.
Because textual ETL does things on a computer and in an automated manner, there is no limit to the number of records that can be processed. Textual ETL frees the doctor from having to manually read records.
And because the processing is done on a computer, it is fast and inexpensive.
One way to think of textual ETL is to think of it as a means of reading a document and finding all the important data, removing the extraneous data, and placing the data in a data base. Suppose you want to find language about a procedure. You don’t have to read the entire document. You let the computer read and organize the document. Now, finding the text that you want is easy and efficient for even the largest document. And textual ETL removes the clutter that isn’t relevant to the nexus of the document.
Because textual ETL has built into it the ability to automatically edit and transform text, you can now bring together text from different disciplines. That is easy and natural in textual ETL.
Furthermore, textual ETL does not care about who the provider of the text is. The vendor providing the medical records is irrelevant to the analytics that can be done against the records. The records can be very old records that are combined with very new records. It simply doesn’t matter to textual ETL. The only requirement for textual ETL is that the records be in the form of text.
But perhaps the biggest challenge in creating the database for analytical processing is in doing the detailed recognition and editing in the creation of the data base. As textual ETL reads the raw text and created the database, textual ETL can edit and transform the data so that when it arrives in the database, it arrives in an integrated manner. This means that meaningful analysis can be done on the database immediately.
But perhaps the single most advantageous feature of textual ETL is that there is no limit to how many medical records that can be processed. There is no limit to the number of records that can be processed. Furthermore, the cost of reading and processing the medical records in an automated fashion is not expensive.
The result is that now – for the first time, you can start to do analytical processing against medical records. What once was an expensive, laborious, error prone activity is now fast, inexpensive and accurate.
That is why it is said that you can start to do analytical processing against medical records today like you have never been able to do it before.
Check out Bill’s latest publications
- Turning Text into Gold: Taxonomies and Textual Analytics
- Hearing the Voice of the Customer
- Data Architecture: A Primer for the Data Scientist, 2nd Edition