Overview of a successful predictive maintenance:
一、Predictive maintenace system could be used to estimate time to failure by monitoring faults condition, the rate of deterioration of an incipient fault follows exponential model. Predictive maintenance is only effective on incipient faults, as opposed to abrupt faults or intermitten faults.
二、Predictive maintenance system should be able to perform an accurate quantitative analysis on the signals derived from a specific fault in order to quantify asset manufacturing quality. Monitoring faults from an early stage could effectively avoid knock-on effect, and could be used as a basis for procurement or maintenance quality. knock-on effect is the reason why asset aged, and MTBR)Mean Time Between Repair is reduced.
三、It is vital to quantify asset manufacturing quality and accurately quantify asset condition at any specific time, predictive maintenance system must be able to process signal noise, determine transducer accuracy, handle fault signal overlap, identify operation and load variable, and analyze on a quantitative analysis platform.
四、Batch production assets: Should focus on knock-on effect, avoid faults to deteriorate to knock-on effect zone, where MTBR will be shortened, speed up aging and causing additional redundant power consumptions. Avoiding knock-on effect zone could effectively reduce maintenance cost by 25~30% as mentioned by US DOE report. Accurate infinit prognostic horizon could also be used to evaluate new asset quality and as a basis for maintenance qaulity assessment to improve maintenance quality such as overhaul quality, installation quality etc.
五、Continuouc production assets: An accurate prognostic horizon of 12 months is a minimum,in the case of N+1 redundancy, a minimum of 6 months accurate prognostic horizon is required to avoid any downtime (Not reduced downtime). These requirements could assure a reduction of downtime 35~45%, 70% break down and increase production of 20~25% (Not production efficiengy or any other sorts) as stipulated on US DOE maintenance guideline report. Additionaly, accurate infinit prognostic horizon could also be used to evaluate new asset quality and as a basis for maintenance qaulity assessment to improve maintenance quality such as overhaul quality, installation quality etc. or maintenace capability.
六、Accurate asset prognostics must address the followings, uncertainty in system parameters, uncertianty in norminal system model, uncertainty in system degradation model, uncertainty in prediction and uncertianty in failure thresholds.
Artesis technology has addressed all above, and is the most advanced predictive maintenance system.
Predictive maintenace shoul focus on maintenance, not predictive analytics, It is not how cool the predictive analytic algorithm is, predictive is only an adjective in the phrase! Predictive maintenance dictates when a maintenance activity should incur, such activity should be based on fault detection, preventive maintenace is simply using a time period to replace fault detection to trigger maintenance activity. Therefore "fault" should be the focus, or specifically incipient faults. (There is another article on incipient fault)。As for predictive analytics, there are two distince demand from different modes of production, namely batch production procee and continuous production process including batch continuous production process. In both production modes, prognostic horizon plays a key role,under both production modes need to discuss the relationship between maintenance and prognostic horizon. Please note that we only discuss technical issues, but the drive behind the issues is company culture, we will not extend our dicussion to company culture.
Not many people familiar with (Prognostic Horizon),The term is first introduced by NASA Ames in 2008, in the 2020s particle filters are used (skewed distribution for each time state), and it was found incipient fault follows log normal distribution. Deterministic model could also be used in a similar fashion, using kalman filters and system identification. where instead of sampling, Artesis get asset design parameters. Instead of weighing each hypotesis, scope is limited to incipient fault, and measures signal deviation from linear state base on the fault non-linear properties as single hypothesis and instead of resampling finding the most probable hypothesis, Artesis uses incipient fault properties that state degrade exponentially to confirm incipient fault signals. Artesis has significant low computational cost, the system will function accurately within 7 minutes.
Uncertainty in system parameters: System parameters could be addressed easily by acquiring design information, for any equipment driven by 3 phase induction motors, the design information is readily available to motor name plate,the opration condition could be determined by power factor, frequency and gain(current/voltage),Artesis used these information to resolve the operation condition through circuit equivalent equation and rotational differential equation.
Uncertianty in norminal system model: Follow from above, Artesis employs deterministic model derived from design information, when operation condition varies and exceeds the current model scope, user need to confirm if the condition is as expected, if so, then the model can be updated, or expand the scope model clusters.
Uncertainty in system degradation model: System degradation could be catagorized by different fault types, namely incipient fault(developing fault), abrupt fault, and intermittent fault. It is difficult to contain the uncertainty with abrupt fault, and could be very costly and inefficient to do so, and for intermitten fault usually due to external issues which is more often than not, beyond the scope of discussions, hence Artesis only focus on incipient fault, which accounts for roughly 80% of asset failures. For incipient fault, the deteriorate condition of any specific fault is directly proportional to deteriorate rate for any individual fault, hence it is a exponential model, the signal each individual generates also follows exponential model, also follows a log normal distribution. How to separate each individual fault signal?Artesis calculated each individual fault signal on power spectral density platform a quantitative analysis platform. Although there are some uncertainty remains, but the chances that two faults peaked in some specific frequency band is very low, even if it did, the difference of deteriorate conditions for each fault at any specific time is usually quite large, due to it's exponential degradation model, hence the uncertainty is quite low. since the platform has log Y axis, the exponential function will exhibit an inclined straight line, it could be used to confirm if any signal represents an incipient fault, Assets usually generates numerous signals, these signals could sometimes be mistaken for a fault signal.
Uncertainty in prediction: Artesis is a model based system, it uses asset design information to build model, and improved against operation, while rendering model, numerous of parameters are also generated as a by product such as power factor etc, such parameters could be used to determine if the operation or load condition could affect each individual fault signals. Artesis also separate incipient fault signals by using it's non-linear properties against the system, any voltage signal noise are eliminated effectively. Artesis technology uses MCSA as basis, MCSA are rarely affected by environment vibration noises.
Uncertianty in failure thresholds: Most of the fault signals are analysed in frequency domain, and PSD is the only platform that could fascilitate quantitative analysis in frequency domain, and calculation on the platform is also valid, unlike the counter part FFT which could only fascilitate qualitative analysis unless a standard is provided. Artesis technology provided a way to calculate how much each signal depart from the linear model, which represents non-linear property for each signal and translate to degraded condition of each individual fault, the calculation involves system identification, where discrete first order differential equation is used to model asset good condition state. For each operation and load condition, threshold is calculated with pertaining frequency, power factor and gain(Current/Voltage).
Artesis CEO Ahmet Duyar had authored many NASA fault detection related articles and had support from GE in both technical and financial. Without any experience and support such ingenius fault detection technology will never be possible, hence developed such unique technology.
Unlike any other fault detection technology, Artesis applied prognostic horizon not by focus on late fault detection ie 2~3 months, but deployed different approach for different stages, and threshold design concept is very different. Most technologies will only get an accurate horizon at roughly 2~3months prognostic horizon or 2~3 months prior to break down, such information is usually meaningless with limited usefulness. But why? At early stage when signal deviates 2~3 STD from linear model, the signal is usually clean, unless there are ambient noises (Vibration noises for accelerometers and power noises for MCSA), sometimes asset also emit strong signals (Not fault signals), and some transducers also have inherit noises, there are also noises from structure, such as suspended motor platforms, naming a few. As the fault degraded to 8 STD from the linear model, usually indicates 6 months prognostic horizon or 6 months prior to breakdown, at this stage any fault would trigger a different fault type, and the industry calls it knock-on effect, hence this is a critical threshold, if lucky, there are no other fault signal overlapping the same band width in frequency domain, even if there are multiple fault overlapping in the same band width, due to the deteriorating rate for each fault are different, and all are degrading exponentially, their manitude difference will also develop in exponential rate, and the one with the fastest exponential rate will become prominent prior to asset break down. hence at late stage the fault signal become prominent and ready for analysis. Of course, there are exceptions, but I have not yet encountered any.
The essense of predictive maintenance is to plan maintenance activities according to asset condition, when is the optimal timing? More often than not is to avoid entering knock-on effect zone, it's the time period where fault would induce another fault, based on experience, when each individual fault signal exceed 8 STD from linear model, it has a high risk to break down in 6 months time or would break down anytime after 3 monthes, this is the knock-on effect zone. Since Artesis has addressed effectively on noise, fault signal overlap, transducer accuracy and operation or load variation issues, therefore has long prognostic horizon, or could be called infinite prognostic horizon, since as soon as asset leaves factory, Artesis technology could perform an accurate qualitative analysis on the manufacturing quality of the new asset. Imagine you have an accurate equipment Out-going Quality Control lab for every asset. The fault detection accuracy exceeds 70% (Including abrupt fault and intermitten fault), meaning almost all incipient fault. If model has not been rendered, it is difficult to see beyond 6 months prognostic horizon, or the accuracy is equivalent to tossing coining to determine fault existance. (MCSA: Signal-Based Versus Model-Based fault Diagnosis. A trade-Off in complexity and Performance, Kyusung Kim)
Furthermore, every equipment have different design, every part have different design, there is no exact patern for knock-on effect, but it have been observed that there could be a combination of different faults that constitute any knock-on effect event, luckily for rotary assets, degradation model is exponential, where the degradation condition difference between faults would also increase exponentially, and the error has minimal effect. There are errors none the less.
Why is accurate prognostic horizon matters? Would you subject your production line to high risk by reducing stock of raw materials?If an expediting service is required, the cost would exceed stocking cost 3 months, account receivable does not look good on financial reports. For domestic freight, a buffer of 2 weeks is minimal, for overseas freight, a buffer of 1 month is minimal, unless a premium is paid, and prices are floating.
What to decide on prognostic horizon and threshold to optimize predictive maintenance? For continuous production mode, 1 month prognostic horizon or threshold is merely effective or not at all effective, it merely replaces break down avoidance with planned down time, the downtime cost reduction is almost negligible. For Batch production mode, it could be used to optimize maintenance capability and quality, breakdown usually incur minimal cost due to flexibility in production, long prognostic horizon could also be used to increase MTBR (Mean time between repair), it is done by avoid entering knock-on effect zone, optimization is required from MTBR reduction against fault induced excess power consumption, therefore a threshold indicating median level of a good new out of factory asset, fault entering knock-on effect zone and a very early stage a fault could be defined, 3 thresholds are equaly important! For continuous production mode, early stage fault defining level could bring attention to monitor degradation rate to estimate break down time, and could effectively avoid break down, where an accurate prognostic horizon equal to or larger than production overhaul period, usually set to 1 year to optimize against social behavior.
Once predictive maintenance strategy is inplaced, further data collection is required to formulate a maintenance plan, strategy is used where context is known but without specifics, to close the gap to formulate a plan, more specific data is required. A good company should have a well defined culture, where values are prioritized, and strategy could follow such value priority to meet the context, and the specific data should be collected for any specific event to form a plan. From the numbers of different fault could duduct the scope of maintenance is vast, hence we could devide it to maintenance capability and maintenance quality, where these could be measured qualitatively with accurate prognostic horizon, repair quality should focus on each individual fault, and deviation from linear model could be used as prognostic horizon asset condition, a comprehensive prognostic horizon must quantify good quality out of factory asset and use it as the goal of maintenance quality, and maintenance capability could use it as goal to restore asset to out-of-factory condition.
Good guality for out-of-factory asset could be defined by different types of fault signal, keep in mind that some assets do have inherit non-fault signals, these takes time to differentiate, provided a trend chart could make life very easy. Artesis uses residue from a model, or deviation from linear model, and resolved a big data that contains 10s of millions of good asset and faulty assets, it is the best practice so far, but need to resolve following issues: Noise separation and elimination, projecting operation & load variation impact, usually could be done easily at a quality lab, but very difficult during operation especially for continuous production mode.(有興趣可以參考預知保養懶人包)
From above diagram, when predictive maintenance system has only 3 months PH (Prognostic Horizon), longer PH are masked, it will be percieved that all faults happens randomly, no correlation could be drawn from repair quality, many deemed maintenance capability is a wast of resources, and maintenance capability would degrade to 3 months PH, hence the reduction of (MTBR), and people take asset aging for granted. It is a vicious cycle, where maintenance capability is neglected and taking aging for granted. In such scenario, fault will exceed knock-on effect threshold and trigger many other fault, and vastly reduce MTBR, and suffers excess power due to faults (2~6%), and "aging" would reduce asset useful life, with good maintenance capability and quality there will not be "aging".
(有興趣可以參考批次生產預知保養指南)
When predictive maintenance system has 6 monthsPH, as illustrated by above diagram, 6months PH is masked, the improvement is barely noticable, MTBR is longer and a trend could be barely observed.
From above illustration, it could be realized that most claimed predictive maintenance systems are only condition monitoring systems. condition monitoring systems serves multiple purposes ranging from predictive maintenance systems to protection systems. They have completely different functions and performances. Not all predictive maintenance systems could serve advanced modern predictive maintenance. some does not address any predictive maintenance needs, merely some accelerometer with DAQ and a fancy digital interface, users are left on their own devices, there are also SCADA system that uses AI and claimed to be predictive maintenance systems, any system will follow the rule of thumb, garbage in will get garbage out. All uncertainties needed to be addressed prior to deploying AI solutions, or AI could be used as a tool to handle uncertainties. Taiwan government has been subsidizing pure AI predictive maintenance systems for years, the specification dictates that it has to start from scratch, build stochastic model and need continuous improvement. It is not difficult to realize that such product must be futile. According to bath tub curve, infancy failure has much higher possibility than normal failure, overhaul re-installation would also triger infancy failure, for electric motors, two surveys have been done by IEEE and EPRI, IEEE mentioned that the area under infancy failure is 300 times the normal failure period, hence with pure AI starting from scratch, no performance will be expected during infancy failure, furthermore, if the system learns from fault event, from the electric motor surveys it stated that motor failure rate is roughly 7%/asset/year, for the complet asset driven by motor would have a failure rate around 10%, don't forget there are more than 100 fault types that would break down the asset, meaning the performance of such system will not reach to any satisfactory state when asset end of life is reached. Please also note that a good maintenance capability and quality could completely eliminate aging.
上圖的綠色虛線比較不清楚,仔細看還是看得到,caution那條線的下面。那就是上千萬筆新生產馬達數據的中位數。
Artesis also provide a bar chart, which organized top most probable 80% faults in a concise manner, with these charts, 80% of incipient faults could be generated automatically. the green dotted line is the good asset median threshold, this could be used as target for maintenance improvements, both capability and quality. there are times when reaching such target could be over the budget, hence it should be optimized, if maintenance capability is lacking behind, as long as there are top management involved, reaching target is just a matter of time. Without top management involved, not only maintenance capability will not improve, prognostic horizon would be deemed irrelevant. Without optimized predictive maintenance is equivalent to ex-work tariff imposed by top management negligence.
Sometime great maintenance capability is used in a wrong way. In Taiwan, Water utility is owned by state, they operate in extremely lean fashion, and people wonder why tap water drinking is hazardous. They barely have engineering capability, they are mostly for commercial purposes, hence they out source pump maintenance in tenders with cheapest bid. Maintenance companies have accumulated enough experience to make sure that the pump repair quality could barely cover the bidding period, usually is 2 years, and the deliberately loose the bid, so their contender wins the service tender, within a few months time. their contender will experience a break down, some contender have great maintenance quality, and this contender will win the next bid, soon they drove out their competitor.
Could any company achieve target maintenance quality without predictive maintenance? Yes, but very rare, we have encountered a person with fault detection super power, he was introduced to us when the government wants to confirm our fault detection performance, our automatic report were used to match with his super power, he uses his hands as transducer to detect fault, his hand has comparable performance with our automatic generated report on mechanical faults. He mentioned that he only purchases 2nd hand motors, to him, it is like picking for a sweet water melon, based on his experience, some repair shop could deliver a more durable motor than the new motors, but he also mentioned he paid a lot of tuition, the tuition to learn countless false alarms and missed faults with his hands.
Outsourcing maintenance should be done with caution, unless the economical scale is unfavorable, it is advised to keep maintenance inhouse. usually any maintenance vendor should have interest conflicts with customer, where vendor favors more opportunities, meaning more downtime and faster asset aging, if both party have exceptionally good relationship, either one of them is not doing their job. There are vendors faking repairs, and send out a good asset, and there are repairs that are done just to meet the spec, in one occasion where customer asked us for a free demo, and the vendor thought we bluffed our performance, we found a faulty asset, and done the initial test, the vendor than repair the asset and the repair was checked with accelerometers, the measurement was within operable condition, then we did the final measurement, and we found out the asset was worst than the initial test, I asked the vendor if he had only fasten the asset casing? he admitted, but the repair SOP has not been breached. When the casing was forcefully fastened, the vibration reading vastly reduced, but the fault persisted, and since the the vibration was contained within the casing, the fault degradation worsen.
Another instance where predictive maintenance is applied: At the gate of a public listed steel Fab, more new motors are carried out by lorries than steel products, so much so that it gives an impression that there is an electric motor plant inside the gate, but there is not, instead, there is a team of 20 predictive maintenance staff in there, we have been assigned by government agencies to perform free artesis demonstration to that steel fab. so we intercepted four of the "faulty" electric motor and tested with Artesis AMTPRO, these motors are good as new, well below the green dotted line, meaning they are in better condition than the median of newly out of factory good condition motors. The steel fab is controlled by a family with 9~11% stake, It will not be surprised to find out that they also fully own the electric motor repair shop. Hence top management involvement could also work completely against predictive maintenance.
Although Artesis technology has made predictive maintenance easy, but do not neglect asset insights, and only wait for alarms. It works best if the information could correlate to the insights. The information should be reviewed constantly, MTBR only gives you a quantitative analysis. Fault degrading rate might vary, it might give you information about parts quality, installation quality, load or operation variations. The system is not a fortune telling machine, it does not know the insights, the user will have to find out.
If the predictive maintenance system you are using could not detect early faults or have limited PH, it will not only help to improve maintenance quality, it will also impose a ceiling to maintenance quality, management would question the reason to spend extra dollars if the system tells you no issues, this scenario will become a vicious cycle. A good maintenance capability and quality takes time and experience to nurture, and it is very fragile, that is why it is necessary to choose a good predictive maintenance system to do the job. Do not put up with mediocre systems, it is better not to have it, to avoid such vicious cycle.
For continuous production mode, first priority is to ensure not to interrupt production, the budget for it depends on the cost of break down, usually enough budget. For continuous production mode, predictive maintenance usually requires optimization, due to limited cost to break down, need to optimize MTBR against extra power consumption by existing fault, asset aging, and high cost to maintain or develop good maintenance quality.
Below diagram: Although it seems the most ideal line is the red line, but there are times the green line is more economical.
To understand maintenance optimization, it is better to get all the cost information that links directly to all maintenance event and activities, it is best done digitally such as CMMS and EAM system, withouth knowing the context and insight, predictive maintenance might be done with extra cost, it is a horizon more important than prognostic horizon, the economics of operation, sometimes even untangible. A solid plan is always more reliable than the smartest strategy.
A successful maintenance quality requires significant amount of resources and experience, without Artesis technology, accurate prognostic horizon takes time to build based on experience, usually is a chicken egg problem, without accurate prognostic horizon predictive maintenance do not have value, without a successful predictive maintenance no one will care about prognostic horizon, it is always a good choice to start with an accurate long prognostic horizon system. More often than not, a small budget is assigned to predictive maintenance, then few thousand dollars are used to purchase cheap sensors, then goes the vicious cycle. A short and inaccurate prognostic horizon is almost like a curse in predictive maintenance program where people percieve break down as inevitable and all machine age and eventually break down, going back to reactive maintenance on important assets, and maintenance works deemed a waste of resources.
How are downtime cost measured? Maintenance people do not have to face customer directly, they probably never seen customer, how would they be expected to care about customers? they probably don't know the applications of the products they produced, hence, the only thing they could do is to calculate what's on the accounting books, but the actually downtime lost is measured 2.5 to 3 times on the books, most even hide some items in the wrong accounts. Please keep in mind that there are other competitors in the market, and usually it takes a lot of efforts and resources to develop new customers.