E-Solutions Gain Momentum by Taren Grom and Denise Myshko The Forum Life-sciences executives recognize the benefits of e-solutions — from ROI to increased audience reach to reduced paper files — which are a common part of many programs, whether for R&D, marketing, salesforces, or physician and consumer education. According to the Pew Internet & American Life Project, eight of 10 Internet users have looked for health information online, with increased interest in diet, fitness, drugs, health insurance, experimental treatments, and particular doctors and hospitals. This translates to about 95 million American adults (18 years old and older) who use the Internet to find health information. There are many reasons why Internet users might now be more likely to search for certain types of health information, according to Pew’s research. Among the reasons why Internet users are now more aware of what is available to them are an increased number of health-related Websites supplying more content, the call by government agencies for obesity awareness and public education about nutrition, the pharmaceutical industry’s marketing campaigns, and more interest in experimental treatments. E-Relationships E-solutions providers need to first think about the process that their clients are trying to automate then provide the technology that supports their needs. Greenbaum. The No. 1 criterion for choosing suppliers is a proven track record of reaching the physician — specifically, reaching the type of physician that a company would like to reach. Reaching the right physician or healthcare provider is the critical success factor for any type of promotional campaign done in the e-space. Hoyes. We search for suppliers that ultimately can become long-term partners. So that means we closely evaluate the ability of the supplier to track the return on investment (ROI) on our promotional programs in a sophisticated and robust manner. It’s important to show what has been the prescribing gain or promotional response from an e-solution option, such as e-detailing, versus the base growth from traditional promotional tactics. In the industry, budgets are tight and getting tighter so we must look at the trade-offs between different approaches. We also look for suppliers that focus on and have expertise in specialty segments. Many of the models that suppliers have in place are more applicable to the primary care market. In the biotech segment, 700-800 physicians often drive our target markets. Our suppliers must work hard to reach this physician base that is responsible for 70-75% of the market potential. Luiggi. For e-business solutions, I don’t want something that looks glitzy that doesn’t have a business purpose. I include my business sponsors in many of the decisions and get them involved with the vendors as well. This way they can talk the language of the business particular to that e-solution. The biotechnology and pharmaceutical industries are challenging in that respect. Many of the vendors that are out there don’t really understand the pharmaceutical business. Before I bring any business partners (vendors) into the loop, I educate these providers about our business. I don’t want to waste the time of the VP of marketing or the VP of clinical research. hoyes. Whenever possible, we leverage our IT department and our own servers and software platforms to maximize cost effectiveness. But, we also use a number of customized approaches, which may not be tied to our Agency of Record. We use partners for customized solutions depending on the objective of the project and the level of expertise required. Ladner. We see a lot of solutions that suppliers and providers try to shoehorn our business in to. This type of relationship is not very productive. We developed our own customized solution because many times what the supplier suggests costs an arm and a leg, whereas what we can afford may be a toe. E-Efficiencies From ROI to increased audience reach to reduced paper files, e-solutions can, and do, provide true efficiencies throughout the life-cycle continuum. Greenbaum. When talking about any promotional activity in the e-space it is important to have a roadmap for the entire marketing campaign, from the beginning all the way through the completion of the project and what happens during follow up. When we talk about any type of e-space campaign, we begin with the idea that we want to establish a relationship either with the physician or the consumer. That’s what e-marketing is really trying to do. Hoyes. We have realized good ROI for the e-components that are part of our overall programs. We have been able to place our content into electronic archives and have been able to repurpose the content for different audiences or have used pieces in different ways. For example, when developing a base campaign and set of traditional supporting promotional materials, a lot of money is spent on the creative side of things…artwork, photo shoots, layout, etc…and on market research to tailor and fine-tune the messages in support of the brand positioning. Historically, after the materials were rolled out to the Sales Force, all these costs became one-time operating costs. Now, we utilize all the components not only to develop the base campaign, but also to feed into brand websites, outbound e-communications, short-term mini-sites on the web, relationship marketing programs, patient programming, etc. This makes it more cost effective to micro-market. In the past, each of these programs had larger set-up costs and therefore a higher ROI threshold to meet. Luiggi. We are in a brave new world where everybody needs to be technologically savvy, because everything is going to be electronic in a couple of years. We spend a lot of time on training, education, and bringing our employees up to speed on the latest technologies. We offer a lot of free courses on the different applications and technologies. Obviously, coming from a technical background I am biased, but if people are working in a biotech or pharmaceutical company and they are not willing to embrace technology I think they are behind the curve. There are two big things that come out of the pharmaceutical process. One is the actual product itself — the pill, the syringe, or device, and so on. The other is paper; we produce so much paper in this industry that we had better be focusing on automating the processes. Ladner. We are using a customized e-solution that can handle huge numbers of antibodies, for example. We have projects in which we have evaluated more then 10,000 antibodies of a given target. By collecting the data from 10,000 antibodies, we can evaluate the targets, reduce the number down to 500, then down to 20. At the end, we might find six antibodies that have the biological properties we want. We could not do this without a good technology solution. Luiggi. I don’t think we will ever get to a completely paperless environment; this is one of the biggest myths of all. Will we get a lot closer? Absolutely yes. Our long-term plan is to automate as much as possible and get rid of a lot of our hard-copy file space throughout the company. When I say file space, I am talking about physical paper within our storage rooms, file cabinets, and so on. There is absolutely no reason to maintain massive libraries of paper anymore, especially when most of the data can be scanned fairly inexpensively. We can’t get rid of all paper, though. There are always going to be times when people prefer to read documents in hard copy. I am one of those people; if a document is more than five pages, I don’t want to read it on the screen. E-Content Whether the e-solution is being used for detailing, marketing communications, or education, the content needs to be rich and engaging. Hoyes. We do specific stand-alone e-detailing — really pilot phases right now — but these are less than 5% of the overall budget. I do feel these are important. I think they allow us to supplement call frequency on some physicians and to extend reach. They also provide a platform to stay on top of all the emerging tactics. As more physicians indicate that this is a preferred way of receiving information, we are going to focus on how to tailor the message and make it as close as possible to a live venue. We may add other components, such as video links, which make the multimedia experience fresher and more engaging. We’ve also been able to reach physicians that we normally couldn’t reach, because they don’t typically see a sales representative. And, we’ve been able to deliver our messages in a format that ensures greater control over the content and increased attention. Not to mention that many physicians tend to do the e-detail on a Friday night or Saturday, at a time when we cannot reach them through traditional sales calls. All of our core programs have a web presence or e-solution component. More than half of our spending has some e-component built into it. For example, a tactic could include basic things like email notifications and ongoing email reminders and alerts, but it also could include more sophisticated CRM programs and even e-scheduling, compliance tracking and honoria payments for Speakers Programs. We also use e-solutions for our patient programs and for REBIF compliance programs. So, we try and work e-solutions into the fabric of everything we are doing because it increases the effectiveness, and the impact, of our communications. Greenbaum. Last year, e-detailing was brand new in Europe, and we were the first company to offer e-detailing in Germany. At the time, there was no competition. Eventually e-detailing will reach a saturation level in Europe just as it has in the United States. We’ve also done e-details in Latin America, Canada, Spain, Italy, France, and the United Kingdom. Part of our planning process for 2005 and 2006 includes an e-component. Hoyes. I don’t think e-details are as effective as having our field sales team call on physicians. The program can be as tailored as possible through various screen flows, but nevertheless, I don’t think it replaces the mix of content, the needs analysis, and the personal touch of having our sales professional in the neurologists office. We’ve also found that for new concepts or launches, e-detailing is not all that effective, but that it can provide a supplemental call frequency or communicate reminder messages for more established brands. That type of approach does provide a good ROI especially in synergy with Sales Force activity. I think that is because for launch products, part of becoming an early adopter is the trust factor. Physicians want to meet with the sales professional or the medical liaison — people they have a strong relationship with – and make sure all their questions are answered. Greenbaum. We’ve had a positive return on investment for every single e-detail that we’ve done. With e-detailing there is seven or eight minutes of uninterrupted time with a physician where we’re able to communicate a marketing message, clinical information, safety information and basic product information. In some cases, we can reach physicians who have never been called on before and in some cases these are no-see physicians. It is one channel through which we can get a physician’s attention on a product. But e-detailing is not for every product. Companies have to be careful about who they are targeting, and they have to take into consideration where the product is in its life cycle. E-detailing is not a replacement for the field salesforce. It really depends on the product. PharmaLinx LLC, publisher of the VIEW, welcomes comments about this article. E-mail us at [email protected]. Life-Sciences IT Spending to Increase Worldwide it spending for the life-sciences sector will reach $38.9 billion in 2008, driven by the need to tackle industry pain points, such as regulatory burdens and costly clinical trials, and by life-sciences companies leveraging IT to enable personalized medicine. This is the finding of a recent report by Life Science Insights (LSI). Life-sciences companies are pouring significant resources into applications that help them act intelligently on the massive quantities of data generated within their organizations. Additionally, they are optimizing the operation of their hardware investments through infrastructure software investments. Significant improvements in certain industry-specific software categories, such as CTMS, are also encouraging some companies to migrate from internally developed systems to off-the-shelf solutions. LSI’s recently published research, Green Book Update: IT Spending Model 2004-2008, provides market size and forecast information on IT spending within the life-sciences industry across 21 technology segments, six regions, and six buyer segments, including academic institutions, biotechnology companies, diagnostic and health-screening companies, government agencies, pharmaceutical companies, and contract research organizations (CROs). According to the market model, investments in hardware are also expected to exceed average IT market growth rates, as information infrastructure needs remain a top priority among life-sciences companies. Network equipment is also a strong growth hardware category as life-sciences companies increase the transmission capacities of their networks to accommodate the ever-increasing volumes of data generated and the growing need for real-time collaboration across departments, functional areas, physical sites, and geographies. Source: Life Science Insights, Framingham, Mass. For more information, visit lifescience-insights.com. Raymond Luiggi Cell Therapeutics We are in a brave new world where everybody needs to be technologically savvy, because everything is going to be electronic in a couple of years. thought leaders n Mark Bard. President, Manhattan Research LLC, New York; Manhattan Research is a marketing information and services firm that helps healthcare and life-sciences organizations adapt, prosper, and explore opportunities in the networked economy. For more information, visit manhattanresearch.com. n Susannah Fox. Associate Director, Pew Internet & American Life Project, Washington, D.C.; Pew Internet & American Life Project produces reports that explore the impact of the Internet on children, families, communities, the work place, schools, healthcare, and civic/political life. For more information, visit pewinternet.org. n Patricia Greenbaum. Deputy Director, Global E-Business, Global Strategic Marketing, Bayer Pharmaceuticals Corp., West Haven, Conn.; Bayer, part of the worldwide operations of Bayer HealthCare AG, is one of the world’s leading, innovative companies in the healthcare and medical products industry. For more information, visit bayerpharma.com. n James Hoyes. Executive VP, Neurology, Serono Inc., Rockland, Mass.; Serono is a global biotechnology leader in neurology, metabolism, growth, and the psoriasis areas. For more information, visit seronousa.com. n Robert Charles Ladner, Ph.d. Senior VP and Chief Technology Officer, Dyax Corp., Cambridge, Mass.; Dyax discovers, develops, and commercializes innovative biopharmaceuticals, including fully human monoclonal antibodies, as well as small proteins and peptides, for unmet needs in the areas of inflammation and oncology. For more information, visit dyax.com. n Raymond Luiggi. VP of Corporate Services and Global IT, Cell Therapeutics Inc., Seattle; Cell Therapeutics is a biopharmaceutical company committed to developing an integrated portfolio of oncology products aimed at making cancer more treatable. For more information, visit cticseattle.com. VIEW on E-Solutions July 2005 Top Five Challenges for Enterprise IT Infrastructures What enterprise IT infrastructure managers struggle with most is ensuring consistent end-to-end application and service performance guarantees, according to a survey by Forrester Research Inc. In its report, Top Five Challenges For Enterprise IT Infrastructure Managers — And How To Resolve Them, Forrester presents the results of interviews with 67 enterprise IT infrastructure managers at $1 billion-plus companies about their key challenges in running the corporate IT infrastructure. Forrester picked out the top five issues that companies struggle with and compiled best practices on how companies are successfully attempting to overcome these challenges. These include: Consistent end-to-end application and service performance guarantees. Business decision-makers increasingly demand, rather than just expect, consistent service-level guarantees for key applications or services across the whole enterprise. Forrester advises IT infrastructure managers to develop a service catalogue and look at implementing service-level management (SLM) and/or business service management (BSM) technologies, the latter as a way to measure the overall service quality from an end-user perspective. Unplanned infrastructure changes resulting in incidents and downtime. Manual interference and the lack of consistent service management processes are still the No. 1 source of incidents resulting in end-user downtime. Unplanned and untested infrastructure changes remain at the heart of the problem. According to Forrester, this cries out for automation and for companies to implement a rigorous change management process — using ITIL (IT infrastructure library) best practices — and dynamically link infrastructure to applications. Unanticipated infrastructure effects from consolidation and new application projects. Almost two-thirds of IT managers expect further infrastructure consolidation in the near future. To deal with the huge management challenge for IT infrastructure managers that results from such consolidation projects, Forrester points to what may sound obvious: implement testing before deployment. This should be an accepted best practice, but Forrester finds that many IT managers still don’t do it. Misconfiguration of network objects. Forrester warns that misconfigured network objects can cause sudden catastrophe to a company’s most critical applications. Although there is no silver bullet here, Forrester believes that change management based on solid tools and ITIL best practices will go a long way. To really see an improvement here, companies should look to automate network configuration and routine tasks. Wide area network (WAN) performance. With companies now starting to deploy voice over IP on a large scale, infrastructure managers have added to their lists of concerns the issue of how to separate mission-critical traffic from the rest. Forrester’s advice: implement WAN bandwidth management and take a look at WAN traffic compression. “Enterprise IT infrastructure managers tell us that the challenges they face are becoming more severe,” says Thomas Mendel, principal analyst at Forrester Research. “But forward-looking companies are already addressing the most important challenges with a mix of ITIL process implementations for service delivery and new tools like service catalogs, SLM/BSM, autodiscovery technologies, network configuration management, WAN traffic compression, and bandwidth management. Companies should look at their specific situation and use these examples as best practice guidelines.” Source: Forrester Research Inc., Cambridge, Mass. For more information, visit forrester.com. James Hoyes Serono We have realized good ROI for the e-components that are part of our overall programs. We have been able to place our content into electronic archives and have been able to repurpose the content for different audiences or use different pieces in different ways. Health Information Online Eight of 10 Internet users have looked online for health information. Additionally, 79% of Internet users have searched online for information on at least one major health topic, which translates to about 95 million American adults who use the Internet to find health information. Susannah Fox, Associate Director, Pew Internet & American Life Project According to a recent study from the Pew Internet & American Life Project, eight of 10 Internet users have looked online for health information. Additionally, 79% of Internet users have searched online for information on at least one major health topic, which translates to about 95 million American adults older than 18 years of age, who use the Internet to find health information. Speed of Access and Years of Online Experience According to Pew’s Health Information Online report, there are many reasons why Internet users might now be more likely to search for certain types of health information. Many health-related Websites are supplying more content, which might be driving users toward certain topics. The call by government agencies for obesity awareness and public education about nutrition may also be increasing public awareness and prompting more traffic. The pharmaceutical industry’s marketing campaigns may be paying off in increased interest in their products. More Americans may be looking for good deals on health insurance or checking up on their hospitals’ quality ratings online. The interest in experimental treatments may be growing as Internet users become aware of the possibilities available to them. Two ongoing trends in the Internet population may reinforce some users’ greater tendencies to seek out certain health information online. There are now many more Internet users with high-speed or broadband access at home. Those who have high-speed connections are, in many cases, more likely than those with dial-up connections to have sought various types of information online. Additionally, there are many more Internet users with six or more years of online experience. These “power users” may now turn to the Internet not only when they have a pressing concern, but when they have everyday health questions about diet, fitness, or to check whether something is covered by their health insurance. A Demographic Breakdown The general makeup of the Internet population has not changed dramatically in the past two years. In December 2002, 57% of American adults said they had Internet access. In November 2004, at the time of the most recent Pew survey, 59% of American adults said they had Internet access. As in 2002, certain groups of Internet users are the most likely to have sought information online: women, Internet users younger than 65 years of age, college graduates, those with more online experience, and those with broadband access. Online health searches expand in areas such as diet, fitness, and drug information When it comes to online health searches, specific diseases and treatments continue to be the most popular topics. But the greatest growth is in seeking information about doctors and hospitals, experimental treatments, health insurance, medicines, fitness, and nutrition. The typical health seeker has searched for five topics. About one-third of health seekers have searched for seven or more topics. Health Seekers Demographic group % who have searched for health info Sex Online women 82% Online men 75% Users’ Age Internet users age 18-29 77% Internet users age 30-49 81% Internet users age 50-64 82% Internet users age 65+ 66% Education Level Internet users with a high school diploma 67% Internet users with some college education 80% Internet users with a college degree 86% Internet Access Internet users with 2 to 3 years of online exp. 66% Internet users with 6+ years of online experience 86% Internet users with a dial-up connection at home 72% Internet users with a broadband connection at home 87% Health Topics Searched Online internet users who have health topic searched for info (%) 2002 2004 Specific disease or medical problem 63% 66% Certain medical treatment or procedure 47 51 Diet, nutrition, vitamins, or nutritional supplements 44 51 Exercise or fitness 36 42 Prescription or over-the-counter drugs 34 40 Health insurance 25 31 Alternative treatments or medicines 28 30 A particular doctor or hospital 21 28 Depression, anxiety, stress, or mental health issues 21 23 Experimental treatments or medicines 18 23 Environmental health hazards 17 18 Immunizations or vaccinations 13 16 Sexual health information 10 11 Medicare or Medicaid 9 11 Problems with drugs or alcohol 8 8 How to quit smoking 6 7 Source: Susannah Fox, Health Information Online, May 17, 2005, Pew Internet & American Life Project, Washington, D.C.. For more information, visit pewinternet.com. Reaching the right physician or healthcare provider is the critical success factor for any type of promotional campaign done in the e-space. Patricia Greenbaum, Bayer Pharmaceuticals The Role of E in DTC Pharmaceutical companies that optimize the online channel as part of an integrated direct-to-consumer advertising strategy are well-positioned to increase overall product awareness and are more likely to engage empowered health consumers who take action, according to Manhattan Research LLC. “Considering the reach and relevance of current DTC ad recall, it can be argued that the current approach to pharmaceutical advertising, focused mainly on television, is a lopsided one,” says Mark Bard, president of Manhattan Research. “Our analysis indicates that a surround-sound strategy — one that includes multiple marketing channels, including the Internet — may be the most effective way to reach consumers in today’s uncertain market.” The report, Truly Integrated DTC: The Role of ‘E’ in Pharmaceutical Consumer Marketing, is based on interviews with more than 7,500 U.S. adults. The report, which contains data specific to more than 30 therapeutic segments, found: • Despite increased overall spending on DTC advertisements, the population who recall pharmaceutical ads — about 150 million adults — has been stagnant for the past several years. • E-health consumers — those who research health and drug information online — are more likely than all adults to recall DTC ads, with a 10% greater recall rate than the average consumer. • Among those who do not recall DTC ads — 32% of adults — more than 50% are actively online and a growing number are using the Internet as a source of drug information, representing a true growth opportunity for pharmaceutical marketers. This group tends to be outside the “mainstream” target of DTC advertising (ethnic minorities, in poor health), yet 53% still take a prescription medication. The online channel presents a potentially complementary channel to reach this audience (38.4 million strong) beyond traditional broadcast mass media. • U.S. adults seeking any information after viewing DTC ads are three times more likely to use the Internet as a source of information than they are to use the toll-free number mentioned in the advertisements. Additionally, online consumers are five times more likely to use the Internet than the toll-free number. • The Internet is not just for DTC responders anymore. U.S. adults with questions about a prescription medication they are taking are six times more likely to go online in search of answers than they are to call the toll-free number. • The opportunity for loyalty marketing programs via the Internet is a big one, with an estimated 34 million consumers expressing interest in such a program for a prescription they are taking. This group is also twice as likely to have requested a prescription of choice from their physician in the past 12 months. After consumers are exposed to a product through any medium, the Internet is increasingly becoming a critical response channel. Mark Bard President of Manhattan Research Source: Manhattan Research LLC, New York. For more information, visit manhattanresearch.com. Defining Data Integration Data integration is the process of accumulating and combining data sets from disparate sources at various locations. These data can then be used for business intelligence, CRM, data mining, or other applications that involve the analysis of data to make key decisions. Data integration incorporates a series of processes — the sequence of applications that extract data from various sources, bring them to a data staging area, programmatically prepare the data for migration into the data warehouse, and load the data into the data warehouse and data marts. There are numerous software products to choose from in the marketplace to help get this done. But because of the wide range of choices, selecting the right tools can be a difficult process. It’s important to consider all of the capabilities needed not just now but also as projects expand. For instance, consider operations such as data conversion, cleansing, formatting, and aggregation. Usually after the data are extracted a number of transformations may be applied in preparation for data consolidation and subsequent loading into data warehouses, data marts, or dimensional data structures used for decision support systems or business intelligence systems. The Benefits of Data Integration Why would someone want to embark on a data integration project when it sounds like a complicated and time-consuming process? Companies integrate their data for a variety of reasons, including cost savings and improved analysis capabilities. The following are the key benefits a company will experience with integrated data: Availability of data. Once the data have passed through the data integration, the information is in a format that can be used by the various departments within a company. More decision makers have access to the data in a form they can use. For example, the data can be passed on to various data marts and the managers in each department can access and analyze the data to make key decisions. Enhanced data quality. After the data have gone through the data-cleansing process, the data will supply a company with clear, accurate information. For instance, one financial services company now uses data to better understand customer behavior and satisfy customer needs as efficiently as possible. The company mines data from its day-to-day operations with a view to improving customer relations. Better manageability. The data can be seamlessly used in a wide variety of applications, providing users with greater accessibility and manageability. For example, a company can use the same data for data mining, customer relationship management programs, business intelligence applications and much more. Improved decision making. The data are now in a clearly defined format, reducing confusion as to how it can be used and the type of information it can provide. For instance, one retailer integrates data from its 2,800 stores and 40 distributors throughout the United States. This allows the company to determine which stores have what inventory and to improve its negotiations with vendors because it can find out the exact number of parts those vendors have ordered in the past. The data warehouse also provides an up-to-date snapshot of sales, allowing the company to quickly determine the amount of sales a particular store has had in the past three months. Higher return on investment. By transforming the data so that they can be used in multiple applications, a company can achieve a higher ROI. The amount of data are reduced, allowing the company to leverage its existing hardware investment. Also, by selecting the right solutions for a data integration project, a company is able to maximize the ROI by reducing elapsed times as well as the time to administer, monitor, and manage data warehousing applications. Beginning the Process Data integration begins with the definition of the data requirements of the company. This includes deciding what data analysis applications are needed as well as examining the type of data that are available. In addition, it will be necessary to identify where each kind of data comes from, how often data are updated, how data are currently being used, and where data can be stored within the company. Then the company has to decide how it is to going to use those data and what must be done to clean the data and transform them. Once this is completed, the next step is to preprocess the data before loading that information into the data warehouse and database. Preprocessing allows for the reformating of the data for seamless integration and also results in faster and more efficient database and data warehouse loads. Preprocessing Techniques Below are five major preprocessing techniques that will improve the success rate of a data integration project. Selection. Many sites begin their database and data warehouse loads with a mountain of data gathered from heterogeneous systems and/or different processing locations. The important thing is not just the selection of the data needed, but that the information is selected as quickly and efficiently as possible. A custom application could be created to do this, but the time spent on writing the code would be too time consuming. Instead, it’s faster to add software that is designed to specifically extract just the parts of the data specified. Selection should usually be done first, thereby working with the least amount of data possible as the preprocessing is performed. Reformatting. Chances are that records that come from several different systems will arrive with radically different and possibly incompatible formats. Reformatting allows for the rearrangement of the fields in these records so that all records have the same load record image. As an added bonus, reformatting eliminates any fields that aren’t needed in the database or data warehouse tables. Aggregation. Aggregation is used to eliminate duplicate records. At the same time, it’s possible to aggregate and sum a set of records by adding important numeric fields together, and then only load one record in a set that contains the total. This kind of “summing” can be a critical element in optimizing query processing with aggregate tables. Grouping. Grouping allows the data to be split into separate, partitioned table ranges after they have been selected, reformatted, and summarized. Splitting data into several files during one operation is far more efficient than running multiple select applications to create the same number of files. Sorting. The data should be sorted into the order in which they will be stored in the tables or by which they will be indexed. This can significantly improve the performance of the data analysis application. The Data Quality Challenge While preprocessing is a key step, the success of a data integration project also depends on the quality of the data. The quality dilemma is easy to appreciate in a simple example of variant data. Here are five variations that might appear in the key field for a data warehouse query: Jon Smith Jonathan Smith J. Smith Jon R. Smith Jonathan R. Smith Although it is unlikely that these variants will occur all at once, even two or three can severely skew results. Variants can cause overestimates if duplicate records have been created or underestimates if a substantial number of records with variants such as misspellings or abbreviations are missed during aggregation. High-quality data are defined as data that are complete, valid, consistent, timely, and accurate. Accomplishing all this is a critical part of any data integration project. There is a three-step procedure for improving the quality of the data: research, remediate, and enhance. Although the third step is optional, enhancement is important because it can improve query results in a variety of ways, especially for marketing and sales applications. Research The first step, research, has two phases: identify any inconsistent data and find out how the bad data are entering the system. A variety of tools can be used to identify bad data. Some of these tools are dedicated and often quite slow, while others are fast and flexible. It is also important to know how bad data are entering the system, especially if data cleaning procedures were followed rigorously. Some possible sources of bad data are poorly trained data entry personnel and/or inadequate or absent cleaning programs. For example, keys that are sometimes in mixed upper and lower case and at other times all in upper case could be the problem if a cleaning program is not case sensitive but the query program is. Remediate Once the variant data have been identified, a procedure to remediate that data and to prevent future corruption must be worked out and put in place. Again, a dedicated tool can be used for this procedure. Enhance A thorough data quality program includes two phases. First, high-quality data are prepared for loading. Data are either captured in a standard, error-proof way or are cleaned in preparation for loading, as described above. Data can also be enhanced for further analysis. For example, demographic and/or lifestyle information can be added to customer records before the actual load. Enhancing customer data usually entails combining multiple sources of data, and these data are often held in multiple databases on disparate platforms. Using a data manipulation tool makes coordinating all these different sources much easier. Get Ready, Get Set, Load At this point, the data have been preprocessed and cleansed. Now the data are ready to be loaded into the data warehouse before being used by various applications. By completing the sorting and aggregation discussed previously, dramatic performance gains can be achieved when loading. A specialized load program can be used to complete the process. Successful data integration can mean the difference between a company that realizes its full potential and one that just keeps missing the mark. After all, data integration offers the user a higher degree of accuracy, increased availability, improved manageability, better decision-making capabilities, and much more. This is because all the data have been taken from a variety of different sources, cleansed, reformatted, and transferred into a data warehouse where they can be accessed by multiple users. The quality of the data are high, giving users a degree of accuracy that they didn’t have previously. Source: Craig Abramson, Senior Technical Analyst, Syncsort Inc., Woodcliff Lake, N.J. For more information, visit syncsort.com.
An article from

E-Solutions Gain Momentum
Filed Under:
Commercialization