By Bill Pollock
These days more than ever, businesses are operating in data rich environments. Data emanates from everyday business operations, sales and customer account activities, service call activity, financial and economic transactions, regulatory reporting and all the other events that are routinely captured and stored in databases. Existing global databases are adding terabytes of new information daily. Every moment of every day bank transactions and electronic funds transfers, point-of-sale systems, hospital tests and procedures, factory production lines, airline reservations, service calls and even electric meters and gasoline pumps are creating digital records that are stored somewhere in a database.
The vast majority of this data, however, will never see the light of day. More often than not, data will be stored for a specified period of time, in some cases as required by law, and then purged to make room for more current data of the same kind. This process is likely to repeat ad infinitum. Yet in many cases this data can represent a rich ore of valuable information and knowledge about the domain from which it has been taken.
What better source is there to learn about patterns of customers' preferences and buying habits than from the customers themselves? Not just what they tell you they need or like in a customer survey, but what they actually buy. What better source is there to learn about equipment failures and service requirements than from the equipment itself? Not just from what your field technicians tell you, but directly from the equipment. What better source is there to learn about the risk in lending or extending credit than from your business's own financial successes and failures? Not just from what your banks or creditors tell you, but from your own financial experiences, both good and bad. The list goes on and on.
Organizations are always searching for knowledge that can advance their cause and keep them abreast of the market, anticipated trends and the competition. Marketing managers would love to know what makes their customers tick. Manufacturing managers would do anything to find out how they could improve the quality of their products, even by just a fraction of a percentage. Not to mention the securities traders who would "sell their corporate souls" just to keep a half-step ahead of the pack in being able to detect a change in trends.
Oftentimes the answers to these questions are contained in the data that businesses routinely collect, store and discard from their ever-growing databases. Many companies have already recognized the potential of this source of knowledge and have invested
substantial effort and significant amounts of resources to uncover the precious knowledge “hidden" in their data. Among the various emerging technologies being utilized, some employ a combination of both the traditional and newer paradigms in a field known as knowledge discovery, or database mining.
Digital marketing companies use related methods to create more targeted and effective lists for the products and services they are promoting to improve their overall effectiveness. Automotive companies use the same techniques to discover patterns of failures and corresponding information to incorporate into the proprietary knowledge bases that they distribute to their authorized dealers and licensed mechanics. Many more applications of a similar nature span across businesses and industry segments of all types under the banner "let the data work for you.”
The analogy of database mining to quarry mining is very appropriate too. In ore mining the process goes through tons and tons of dirt in order to extract one precious gram of gold. Similarly, in database mining, one may also need to go through very large quantities of data just to get to the one piece of information that makes it all worthwhile.
Machine Learning Enables Efficient Data Mining
Machine learning techniques, developed under the umbrella of Artificial Intelligence (AI), were originally patterned after a unique human intelligence trait – the ability to acquire and create new knowledge. From this basis, new and highly sophisticated AI techniques have been developed using a broad array of disciplines and strategies, and reflecting various levels of success.
Today, knowledge discovery tools and methods employ a broad range of technologies and methodologies. Neural networks are probably the best known and most widely used approach to machine learning. The technology is quite versatile, relatively mature and has been used very successfully in a broad array of applications ranging from the screening of credit card applications, to placing geographically-based advertisements in national magazines, to reading handwritten addresses and routing the mail. Other discovery methods are based on technologies such as information theory, fuzzy set theory, rough set theory, nearest neighbor metrics and others.
Why knowledge discovery? Your organization may be sitting on a goldmine of data which could be converted into useful knowledge – knowledge that can be used to help you focus your strategic and marketing planning efforts; monitor and improve the quality of your production and service delivery processes; and explain your customers' sensitivity to your competitive pricing structure, customer service performance, brand name recognition, advertising and promotional campaigns or anything else you would like to learn about the markets in which you operate.
Many organizations have already recognized the potential benefits of these new technology applications and are utilizing these tools to lead them to smarter, more efficient and more productive operations. The list of such companies is growing every day – and your organization should also leverage the knowledge to join them.