Many data scientists expect that data is already cleaned, ready in the database for the sexy stuff – visualization and predictive insights. They don’t realize that there are many steps to dealing with data – how to access it, which SQL database, how should it be manipulated and is further cleaning necessary for processing.
Yet, according to a recent survey by Crowdflower published in Forbes, data scientists spend 80% of their time on cleaning and data preparation. 57% of them (data scientists) regard cleaning and organizing data as the least enjoyable part of their work.
Data science use cases can only be realistic once the data scientist has collected (including recommendations on real time collection) and prepared the data.
From the list, we have focused on B2B use cases in manufacturing based on compounded annual growth rate and forecasted corporate investment. According to IDC, big data and business analytics worldwide revenues will grow from nearly $122B in 2015 to more than $187B in 2019, an increase of more than 50% over the five-year forecast period. One of the industries that present the largest revenue opportunities in big data analytics is discrete manufacturing with forecasted revenues of $22B in 2019.
Equipment Maintenance. Businesses lose resources when equipment malfunctions. Companies in retail, automotive to energy lose money due to reactive factory equipment maintenance. Data science approaches can detect when a machine requires maintenance by reviewing historical data and logs, designing optimal real time data collection, thus eliminating unplanned downtime and reducing costs. Machine learning models combining data from the ERP, IoT devices (if installed) and other field systems can be analyzed using supervised or unsupervised machine learning models enabling real time optimization to occur.
Fleet Management. Tracking deliveries between locations and drivers from the logistic centers has always been fairly manual. This is because most truck operations are decentralized, most drivers work for different companies and the nature of their job means data gets lost or not collected. Improved data from fleet, individual trucks, timing of goods collection and other data can be made more real time. Data scientists can review data flows and data collection efforts, make recommendations for improving collection and adherence and eventually create an effective prediction model. This enables drivers to get goods to the right customer, at the right place and time reducing lost sales and increasing customer lifetime value from possible delays.
Product Logistics and Inventory: Knowing the status of product flows real-time is important to assess and optimize inventory. According to the Inventory Management Society, the carry costs for inventory is 25%. This includes warehousing costs such as rent, utilities and salaries, financial costs such as opportunity cost, and inventory costs related to perishability, shrinkage and insurance. Basically, this means that the more inventory you have, the more expensive the operation. With data science, product flow from materials sourcing to finished product and customer delivery can be analyzed ultimately reducing stock-outs and inventory obsolescence (waste).