Maintenance Nov 018
Maintenance Nov 018
Maintenance Nov 018
November, 2003
Page
were then tested using a simulation model that sought to minimize the costs
associated with repair. The model showed that opportunistic policies performed
well and were very robust, although far-sighted hard-time policies (where cars
are subject to an aggressive replacement of parts at fixed intervals) could also
perform well in certain circumstances.
Review of Maintenance Concepts
There are three main types of maintenance activities: (a) scheduled maintenance;
(b) repair maintenance; (c) on-condition maintenance. Two types of failures can
be distinguished: in-service failure, or an incidental failure. An in-service failure
is detected at a time when replacement will cause disruption to services, while an
incidental failure is detected while the equipment is already out of use. Severity
of failures could be classified into three categories: (a) regulatory failure; (b)
engineering failure; (c) catastrophic failure. Four main types of preventative
maintenance strategies can be distinguished: (1) no preventative maintenance;
(2) age replacement (intervals of length of time, or usage); (3) block replacement,
where components are replaced at fixed intervals in addition to failure-related
replacement; (4) opportunitistic maintenance, where the replacement decisions
are based on the state of the rest of the system.
Critically, Little distinguished between constant failure rate (CFR), increasing
failure rate (IFR), and decreasing failure rate components (such as certain
components subject to a burn-in period). Complex systems of IFR components
will exhibit CFR behavior in the long run (Barlow & Proschan, 1965), an
important result that favours on-condition policies over hard-time policies.
How is Railcar Maintenance Done?
To understand how maintenance can be scheduled (or to what extent scheduling
is really prudent or even possible) and optimized for both efficiency and
effectiveness, a thorough understanding of how railcar is maintained on a day-today basis is needed. Relying on the authors experience at the former British Rail
(BR), interviews with a number of American transit agency maintenance
personnel, and literature reviews, this section serves to document the current
state-of-practice of maintenance regimes and scheduling policies.
Survey of The State of Practice
The former British Rail had developed three different maintenance regimes in its
50-year existence. Prior to 1980, equipment was brought to the Main Works for
general overhauls periodically. This is essentially a variable-time greedy policy,
Page
Page
The general problem with hard-time policies (Little, 1991) is that, there may not
be a magic-number interval which is the best time to do everything. As long as
the life-cycle of individual components are not even multiples of the basic interval
(a condition that most likely exists), hard-time policies would not be most
efficient. Purely on-condition policies are also inefficient as they disregard the
costs of failure and potential savings in joint maintenance activities.
The inspection cycle is constrained both by the physical plant and personnel
availability at the MBTA. However, some improvements are possible. The
reliability of the fleet may be improved by introducing tiered maintenance plan
whereby certain components are inspected every other inspection cycle. If
reliability could be improved to reduce repair maintenance workload, total costs
could be reduced. Productivity may also improve through higher utilization of
residual component life, although their costs are not generally significant
compared to opportunity costs of the railcar, and the labor.
Although the MBTA keeps track of mileage and failure data for the fleet (MBTA,
2002), it would benefit from collecting detailed cost data in parts and labor, and
delay-minutes data by failure mode. Failure mode data is currently kept through
dispatcher reports, which can sometimes be inaccurate. Some shop personnel in
charge of repair keep records of the precise mode of failure, however, the quality
of data is inconsistent and very little analysis is routinely done to identify problem
components. These are potential areas of improvement.
Better cost data would enable better cost-benefit analyses. Typically, the cost of
failure in-service will greatly outweigh any maintenance savings. If satisfying the
market demand requires far fewer railcars than available (such that cars sitting
out of service incur zero opportunity costs), significant benefits are most likely to
accrue by minimizing failure rates, since the cost of failure is much more
significant than the preventative maintenance costs. On the other hand, if a more
aggressive preventative maintenance program will result in lower asset
utilization, the benefit of reliability gain must then exceed the economic cost of
the program.
Performance Measures
There are four basic types of maintenance performance measures, corresponding
to the four main objectives of maintenance: (1) Instances of high-risk technical
failures (potentially a safety violation); (2) Mean usage between failures (a
reliability measure usage measured in miles, time, or cycles); (3) Availability (a
supply measure); (4) Maintenance cost or productivity (a cost-effectiveness
measure). The tension between availability and reliability on the road is well
Page
known. Some of the measures are aversely affected by variables outside the
maintenance managers control apart from maintenance effectiveness, factors
such as vehicle design, the weather, and fleet age all affect reliability. The
caveats are documented elsewhere, and senior managers should be aware of
these issues if chastising maintenance managers. Little (1991) discusses a
number of alternate performance measures, such as mean distance between
maintenance events, and distance per in-service failure, which are designed to
balance preventative with repair maintenance. Some of these could be adaptable
to a transit authority environment.
Maintenance Performance Social Externality Interactions
In literature on maintenance performance measures, the interaction between
costs and other performance measures are not often discussed. Traditionally,
maintenance has been considered to be a support function that provided
whatever the transportation department required. The systems approach
towards operations would require costs and benefits of different operations &
maintenance plan to be evaluated together quantitatively. The state of practice in
evaluating maintenance effectiveness has tended to specify the required
availability and reliability, minimized the number of high-risk technical incidents,
while remaining within budget. This usually led to an over-emphasis on
availability and reliability, while insufficient attention is paid to costs, both direct
maintenance costs and the opportunity cost of vehicle time. If a real issue is
made of maintenance costs, safety could become neglected over time, as
evidenced in Philadelphia streetcars in the 1970s, and privatized British Rail in
the late 1990s.
The MBTA case study provides a good illustration of this phenomenon. Anecdotal
observation of shop-level initiatives suggest many maintenance managers are
very efficient local optimizers meaning that they are apt at reducing
maintenance costs locally while maintaining the same reliability and availability.
For example, Riverside Carhouse of the MBTA Green Line pioneered a new
method of inspecting a critical part on articulated trolleys, tripling the throughput
compared with methods envisaged by the manufacturers. Another initiative to
reduce wheel-wear rates is currently being debated. However, often the
interaction between opportunity cost of assets and maintenance is neglected,
since those costs fall outside the jurisdiction of the maintenance manager (and
sometimes the transit authority). For instance, work scheduling is sometimes
done to maximize utilization of repairers and physical plant, and not the vehicle,
even against a background of chronic car shortages. Maintenance managers
sometimes indicated that reduced service levels will lead to greater reliability and
lower maintenance costs, without considering the societal costs of spilled or
Page
operations and maintenance are traded off to explicitly maximize the consumer
surplus.
Maximize Service
Maximize Consumer_Surplus
Maximize Fleet_Performance
Subject to {Maintenance_Budget, Plant, Personnel,
Engineering, Safety,
Operational_Requirements}
Alter (Maintenance_Plan)
Performance Measure {MDBF, Availability}
Algorithm 1: In the traditional paradigm for
transit service planning, MDBF and availability
are maintenance output measures and are
treated as constants in operations planning.
Maintenance Department failed if a specific
level of MDBF was not delivered, while
Transportation Department assumed a specific
level of fleet availability.
Page
MBTA failure data treats each two-car set (married pair) as a basic entity. Thus,
if one car has a traction motor cut-out, the whole two-car set is considered
defective, but not the entire consist. Although the remaining vehicles in the
consist that did not fail will be returned to service relatively swiftly, service is
disrupted, and all the management costs associated with an actual failure would
be incurred on the entire consist. At other transit properties, fixed-formation sets
of up to five cars are seen, giving rise to unique issues in measuring reliability.
There are other subtle effects that may occur due to the way the transportation
operation is managed, e.g. six-car consists may run all day with up to two trucks
unpowered, resulting in vehicles being dragged in a failed condition, yet
continuing to accrue mileage. To assess accurately the failure rates and the
consequence of failures, detailed consideration of these effects would be
required. At this point, it suffices to say that the current approach ignores such
consist-level effects, and considers everything at the set level.
Structure of the Model
An example model might consist of five separate parts of economic evaluations of
the rush-hour operating plan: (1) the economic cost of spilled passengers, due to
inadequate capacity provided to carry rush-hour loads; (2) the cost of providing
operators for the consists, which varies with the operating plan; (3) the cost of
unreliability, which depends on the probability of failure and the impact of
failures on customers; (4) the cost of waiting time, which depends on the
headway; (5) the expected life-cycle cost of the equipment, based on the
operations & maintenance (O&M) plan. In the current example, values are
enumerated using fairly standard modelling methodologies in some level of detail,
with an aim to capture the quality of service as seen from the perspective of the
average rush-hour customer. Several O&M plan scenarios were then used to
drive the model, resulting in a comparative evaluation from a public-benefit
perspective. The current analysis ignores any budgetary constraints, although
obviously this shortcoming could be mitigated if only the subset of O&M
scenarios that are actually feasible from a budgetary standpoint is fed to the
model.
The Passenger Spill Model
The transit passenger spill model was conceptually inspired by airline revenue
management spill models developed by Hopperstad (1998), Belobaba and others.
In a given time period where transit demand exceeds transit capacity, passenger
spill occurs. Unlike airlines where seats are individually assigned, transit
capacity is a soft number. Spill, or passengers beyond planned capacity, is
likely to result in one of three outcomes: (a) overcrowding on the train as
Page
10
Spill
=
where
D
h
t
s
c
=
=
=
=
=
Clearly, this model assumes deterministic transit operations (i.e. that no trainbunching occurs), deterministic transit demands within one time period, and that
spilled passengers simply disappear without impacting the demand in the
following time period. These assumptions are obviously not true, and are likely to
underestimate the actual spill that occurs. However, using a conservative
number for capacity per vehicle, which represents the nominal capacity that
many transit authority declare in their service standards, could mitigate these
shortcomings.
The volume of passenger spill could then be methodologically converted into an
economic dollar value the consumers willingness to pay to avoid being spilled
by a particular subway train. Using the economic idea of value-of-time from
passenger demand forecasting, the construct here is entirely theoretical, and may
not translate into actual monetary values. The spilled passenger is assumed to
wait until the next time-period and take a following train, thus the social
externality is simply the product of value-of-time and time-period. For simplicity,
it is assumed that capacity would be available in the following time-period (if not,
the passenger will choose to endure overcrowded conditions). This will tend to
underestimate the economic cost, but is realistic since it reflects real-world
passenger behaviour. Many refinements are possible, however, this model
represents a first-cut effort at trying to explicitly estimate the societal costs of
providing inadequate capacity during the rush-hour. Exhibit 1.2 illustrates a
particular run of the transit passenger spill model.
Page
11
p(Fail)
=
MDBF =
m
=
l
=
c
=
mlc
Consideration of the operating plan outside the rush hour is not necessary since
small changes in the number of units made available at the beginning of the
workday would only affect the service level during the rush-hour. The vehicle
requirements were then used to calculate the expected social cost due to inservice failures using the following expression:
Cost(Fail) = p(Fail) d r v
d
r
v
The MBTA considers all railcar-related incidents that cause more than four
minutes delay a failure. The average delay per failure (d) is assumed to be
approximately ten minutes, based on January 2002 track circuit data compiled by
an automatic computer programme (81 events, average 7.67 minutes, variability
0.48 Wile, 2002) and the authors experience. The r term is an estimate of the
number of passengers impacted by a given incident. Given that the failure will
occur at a random time during either the morning or the afternoon rush, on
Page
12
= Headway, in hours
= One-way rush-hour ridership
= Average passenger value of time
Page
13
14
enhanced maintenance, the costs of unreliability and spill are all swamped by the
need for frequency, since frequency changes affect all riders while unreliability or
spill only affect a small proportion of riders.
Discussion
Some of these results are exceedingly counter-intuitive, if not controversial.
Essentially, this cost-benefit analysis suggests that if riders valued their time with
the same dollar value regardless of the circumstance, then reliability really does
not matter. Given the current level of technical performance achieved by electric
trainsets, since failures are rare, keeping spare trainsets to cover for failures or
allow additional maintenance to enhance reliability is a futile exercise. According
to the analysis, the trainsets are always better employed on the road, where they
are reducing waiting time (through enhanced service frequency) for all riders.
Since traincrew costs are insignificant compared to the level of social benefit they
generate, the allegedly optimal strategy is to run as many trainsets as youve got
every morning, and run em till they drop. Spill is almost a non-issue because it
occurs only at the peak of the peak.
There are number of problems with this analysis, from a strictly technical
perspective. Firstly, the capital costs of acquiring trainsets have not been taken
into account. This absurd result may have its origins in trainset acquisition when
the fleet size was determined from the peak loadings. If the fleet size was
determined by rational analysis rather than the need to cater to every traveller at
a time that they choose to travel, it is likely that the marginal trainsets would
never have been justified. Secondly, this evaluation is essentially a marginal
analysis, thus none of the larger costs and benefits has been taken into account
instead we have assumed they are constant. It is possible to consider a spilled
passenger as value lost rather than time lost the trip might be worth $10 to
consumers who could not get on the train because it was full. Different
conclusions might then be reached. None of the larger questions have been
considered, such as the use of transit versus automobiles, and congestion pricing.
Equipment Failures, Headway Degradation, and Consumers Perception of
Unreliability
Intuitively, there are other issues with this analysis. Are the values-of-time really
the same? Survey of regular subway commuters will likely reveal that in reality,
with perfect schedule adherence, there is little difference between headways of
five-, four- and three-minutes. On the other hand, there is a big difference
between headways of five- and ten-minutes. Thus, all the alleged benefits of
increasing frequency disappear, though the costs of operators and spill are real.
Page
15
16
sensitive to local variables such as number and location of train doors. Further
data collection is necessary to calibrate a model. An implicit assumption in the
present model is that at any given time, the maintenance shop has perfect
knowledge as to which train will breakdown next, and therefore will not send out
the marginal train except when absolutely necessary (i.e. in rush-hour service).
Without such information, it is necessary to consider the disruption an unreliable
trainset may cause in the off-peak hours.
The Effect of Reduced Maintenance on Equipment MDBF
How MDBF would change with the maintenance regime is a function of the
proportion of failures that are preventable (i.e. attributable to inadequate or
shoddy maintenance) versus random (i.e. effects of unforeseeable events such as
weather, previously unknown design failures, etc.) For MDBF to reach 25% of its
former value in a reduced maintenance scenario would require a 75% of the
current delay-minutes to be caused by preventable failures. Actual failure data
would improve this aspect of the model. Results here are necessarily vehiclespecific and empirical.
Preliminary attempts were made to correlate historical Boston temperature data
with failure modes, using MBTA Failure-in-Service System, Vehicle History Report
(Orange Line 07/01/02 to 01/23/03). With a dataset containing 687 incidents
(including 118 service interruptions), a correlation of 0.64 was found between
temperatures of under 25F, and a moving average of a composite failure index of
air, brake, and door components (See Appendix 1). In one interpretation, this
suggests 10% of all service disruptions are attributable to inclement weather
causing drainage valves to freeze and rubber to fracture, and therefore no
improvement is possible here with preventative maintenance. Another
interpretation suggests that regular rebuilds of valves and better rubber
materials may help. A third source believes even though preventative
maintenance may not mitigate all air system faults in cold weather, a combined
maintenance/operating strategy may still yet decrease service interruptions.
Freight railroads operating in the Great White North have long limited train
lengths to 60~70 cars during the winter, to allow for changes in mechanical
system performance due to increased leakage and denser air. In addition, wheel
and track fractures increase dramatically in cold weather also, giving rise to
further sources of unreliability.
Whatever your belief is with respect to cold weather, other failure modes exist for
which there is no preventative remedy. Broken windows and slashed seats are
the result of vandalism, while wheel flats could result from a faulty brake system,
a malfunction in the signalling system, or repeated misuse of the emergency
Page
17
brake by traincrew. However, these only account for 9% of total incidents and 3%
of service interruptions. Nonetheless, these are areas of maintenance that most
customers perceive as representative of the general state of the system. To what
extent maintenance should focus on rectifying mechanical malfunctions versus
cosmetic deficiencies, is another topic worthy of research. Artifacts in the MBTA
data suggest such cosmetic issues are commonly under-reported. Failures have
also not been categorized consistently.
Theoretically it is possible to attribute delay minutes (results of service failure) to
one of the following categories: (a) equipment failure related delay, likely
triggered by an external condition; (b) other unexplained equipment failure
related delay possibly preventable; (c) delays unrelated to equipment failures
to include infrastructure failures, resource allocation issues, and perturbations of
service resulting from the stochastic element of the system (i.e., dwell times and
train operator characteristics). Earlier work on freight rail reliability
demonstrated that only 30% of shipment delays are related to rail technology
(i.e., equipment plus infrastructure). For a detailed review of extent literature,
see Kraft (1998). A future study to classify causes for delay may reveal a
similarly low loading on total delay minutes of equipment failures, with the
implication that even a significantly lower MDBF will not seriously hamper
service reliability if other improvements could be made, such as more aggressive
headway management. The loading is necessarily a function of route length,
headway, and resources available to recover from an equipment failure event
and could conceivably lead to different conclusions for different operating
environments.
The Evaluation Framework
Although the framework clearly warrants further refinement, the evaluation
methodology is clearly valid and extremely powerful. The traditional view that
the maintenance department exists and serves at the pleasure of the
transportation department is no longer tenable. Instead, a systems-approach to
service planning is required, with the opportunity cost of assets being explicitly
traded off between transportation and maintenance disciplines. Evaluation
should be in terms of social benefits and costs both externalities and real costs
to the transit authority, such as operator wages.
The spreadsheets shown in the exhibit demonstrates that a reasonably simple
decision-support model could be developed without sophisticated computer
systems or access to vast arrays of data. Both conceptual and computational or
data-based refinements will improve this model. Some assumptions also need to
be verified through field exercises and customer surveys. It is the customer who
Page
18
will ultimately judge the success of the transit authority in both transportation
and maintenance.
The moral of the story might be stated as: maximizing reliability, maintenance
actions, or service all are not the right answer. Instead, a multi-objective
analysis is needed to explore alternative maintenance regimes that could improve
service without hugely detrimental effects on reliability for instance,
maintenance of trainsets at night while targeting 95% availability.
Conclusions
There are many nuances of maintenance operations and performance indicators
that require careful thought before using them to make decisions. The
maintenance managers (and line chiefs, responsible for both transportation and
maintenance) who are familiar with these caveats may achieve better planning.
Operations planning for transportation and maintenance should be done
concurrently, using a cost-benefit analysis evaluation framework. Instead of the
current standards-driven requirement for availability, minimum service levels,
reliability, and other such proxies of service quality, trade-offs between these
important variables should be explicitly considered. Assigning monetary values
to each of the variables is possible, provided consumer research could back up
these values.
MBTA could improve its effective maintenance operations by moving towards a
more objective-driven evaluation framework, and perhaps by moving towards a
variable-interval maintenance regime. There are many innovations that could be
tried, such as maintaining trainsets overnight instead of during the day, tracking
costs much more aggressively, and studying the mechanical failure profile of the
train components more closely to make informed decisions about when to
replace.
In the same way that safety standards of the past has moved towards risk
assessment and risk management paradigms, service standards and reliability
standards ought to be moving in the direction of service cost-benefit analysis and
service management. More research is definitely needed in this area to establish
exactly what a cost-benefit analysis would mean in the context of maintenance
and transportation management, and market research is required to identify and
act on customer preferences. Although MBTA may be able to obtain a more
efficient maintenance operation through changes in the maintenance regime, at
some level the existing maintenance regime is functional and supports operations
well. Identifying and quantifying (with a dollar value) the transportation
departments needs in terms of customer preference should precede any attempt
Page
19
Explanation
AirFault =
AirIntrp =
BrkIntrp =
IsHalfDoor =
IsSideNotOpen =
FailIndex =
FailIdx7 =
MinTemp =
Freeze =
Correl =
References
Little, Patrick. Improving Railroad Freight Car Reliability Using a New
Opportunistic Maintenance Heuristic. MIT Thesis (1991).
Haven, Paul. Transit Vehicle Maintenance: Framework for Development of More
Productive Programs. MIT Thesis (1980).
Morrison, Gavin. Chapter 5, Maintenance Under The New Maintenance Policy,
The Class 47. Ian Allen, London (1999).
Page
20
Coogan, Matthew and Stanley, Robert. New Paradigms for Local Public
Transportation Organizations, TCRP Research Results Digest 55 (TCRP Project J08B). Transportation Research Board, Washington D.C. (2002).
Wile, Erik. S. Using Automatically Collected Data to Enhance Transit Operations:
Data Source, Proprietary Database. Massachusetts Institute of Technology,
Cambridge, Mass. (2002).
Central Transportation Planning Staff. Passenger Counts for the Massachusetts
Bay Transportation Authority: Heavy Rail Subway and Light Rail Surface Lines,
Boston, Mass. (1995, 97).
Macchi, Richard A. Expressing Trains on the M.B.T.A. Green Line, MIT Thesis,
Cambridge, Mass. (1990).
Deckoff, Anthony A. The Short-Turn as a Real Time Transit Operating Strategy,
MIT Thesis, Cambridge, Mass. (1990).
Kinki-Sharyo Company. Maintenance Handbook for Massachusetts Bay
Transportation Authority No.7 Light Rail Vehicle. Osaka, Japan (1986).
Massachusetts Bay Transportation Authority. Heavy Rail Equipment Maintenance
Department Fiscal Year End Report, Boston, Mass. (1991).
Massachusetts Bay Transportation Authority. MCRS System Daily/Monthly
Operating Reports (March), MBTA Proprietary, Boston, Mass. (2003).
Lu, Alex. Comparative Analysis of Maintenance Practices, Proceedings of 1.259
Term Papers, Massachusetts Institute of Technology, Cambridge, Mass. (2003).
Soeldner, David W. A Comparison of Control Options on the MBTA Green Line.
MIT Thesis, Cambridge, Mass. (1993).
Kraft, Edwin R. Rail-Related Literature Review (Ch.2), from A ReservationsBased Railway Network Operations Management System. Ph. D. Dissertation,
Department of Systems, University of Pennsylvania, Philadelphia, Penn. (1998).
Page
21
Note: The D term, transit link-flow demand at the 15-minute resolution, shown
on a separate portion of the spreadsheet, is used in calculating the passenger
spill numbers, and is not reproduced here. Interested readers should refer to the
data source (CTPS, 1995, 97).
Page
22
ENDS
Page
23