FMEA (Failure Modes and Effects Analysis) and its Usage in FSAE

University of Pittsburgh student Helena Ciavarelli with the cooling subsystem FMEA

Failure Mode and Effect Analysis or FMEA is a structured method to analyze component and system reliability. It is a framework to look at how something fails (failure mode) and what happens when it does (failure effect).

History of FMEA

https://qualitytrainingportal.com/resources/fmea-resource-center/fmea-history/

The US Military was one of the early users of FMEA, where they developed the technique to reduce sources of variation and failures caused by variation in munition production. NASA adopted it shortly after and it is largely credited with its influence on the success of the Apollo missions, and Ford Motor Company was the first auto manufacturer to adopt it during internal audits of the safety issues with the Ford Pinto in the 1970s.

We should understand the risks that are to be avoided, and why these practices were implemented in the first place. Regulations are written in blood, and we learn from the grave mistakes of others and always want to work to do better next time.

Life before FMEA in Automotive. Brady Holt via Wikimedia Commons

Accounting for Failure in FSAE is done in a variety of ways, and most students can tell a judge which pieces are going to fail in what way and can generalize the effect of it. FMEA is a tool to capture these failures (what's the difference between f-ing around and science? Writing it down.) and to capture the ways the team is minimizing the risk of these failures which may impact timeline, resources, competition results, safety, and vehicle performance.

What it’s Made of

FMEA is typically captured in a spreadsheet and is intended to be a document that is revisited at logical intervals of a project. For an FSAE team, this may be during preliminary design phases, during design reviews, after build cycles, and before competition. Each line item features several fields, which are described below.

While it’s technically possible to perform FMEA on a whole vehicle, a whole project, etc., it’s wiser to reduce scope as much as it’s reasonably necessary. FMEA on a cooling system, BMS, pedalbox, or other component/subsystem level pieces can be significantly easier and add more value. Sometimes, this scope can be as granular as “brake pedal bolt”, or as wide as “ergonomics system”. Pick your poison.

AIAG/VDA (an organization which writes standards for the automotive industry, similar to SAE/ISO/etc) recommends the 5Ts to determine the scope, the type, and the basis of analysis of an FMEA. From Quality-one.com, discussing the AIAG-VDA FMEA manual:

InTent

Ensure that the team members are competent to participate in the FMEA based on their experience and role in the FMEA development process. More importantly they should understand the purpose of the FMEA.

Timing

In order to gain the most benefit from the FMEA, it should be a “before – the – event” process and not an “after – the – fact” exercise. It is much easier to make changes to a design or process prior to design completion or process implementation.

Team

The FMEA team should include members from different disciplines who have the subject matter knowledge and experience to obtain the greatest benefit. The AIAG & VDA manual provides a great deal of information and insight regarding the team members and their various roles and responsibilities.

Task

The seven-step process outlined in the manual clearly identify the tasks and deliverables at each phase of the FMEA development. The team should also be prepared to share information with management at various times during the process.

Tools

There are many different software tools on the market that can be used for FMEA development. In some cases, organizations develop their own internal software. In addition, there is always the traditional form-based exercise utilizing the standard spreadsheet method. The manual provides an example of both a software and a spreadsheet developed FMEA.

In short, the FMEA should be built by more than one person and ideally a small group of stakeholders in relevant and adjacent sections, by folks who understand why it’s being done, prior to the failures, and using tools that are industry-standardized.

Source: https://quality-one.com/aiag-vda-fmea/

The Failure Mode

In short, this describes what broke. These may be damaged pieces, fatigued parts, burned items, loss of funding, an act of God, etc.

The Cause

The root cause of the failure. For engineering failures, here are some examples:

  • Material properties

  • Material geometry

  • Loading conditions

  • Interfaces with mating components

  • Energy transfers

  • Data transfers

  • Environmental influences (water, dirt, temperature)

FMEA can also be performed on processes and systems, and the causes differ; remember the M’s:

  • Man (human errors)

  • Methods (rules, regulations, and processes)

  • Material (quality, availability, and applicability of materials used)

  • Machinery (the team’s tools and resources)

  • Measurement (how the team measures quality and state of the team/car)

  • Mother Earth (Environment, acts of God, the University, SAE)

There may be many different causes for a failure mode. Each should be captured in its own line, as the effects, controls, and recommended actions may differ. Root Cause Analysis is a whole ✨thing✨ and not really the topic of this blog post, so I’ll glean right over that.

The Effect

This describes the effect of the failure on the system or the team. Two examples:

The (Failure Mode) loss of funding is (Caused) by the team failing to submit budget paperwork and the (Effect) is they can’t buy new dampers this year.

The (Failure Mode) crash in brakes test is (Caused) by incorrect braking calculations and the (Effect) is car broke :(

Severity, Probability, Detection, and Risk Priority Number

After determining the mode, cause, and effect, the line item needs to be assigned 3 grades to describe the Probability of something actually happening, the Detection tools that are currently used to prevent this, and the Severity of the failure if it were to happen.

These numbers are assigned in a variety of ways, and it’s important to give each of these items appropriate weights and granularity between the numbers to accurately describe their differences. For example, we will use a 1-3-5 scale:

Severity

1 - Insignificant; little to no delay in schedule, little to no risk of safety or reduced quality

3 - Minor; delay in schedule, minor safety issue, or reduced quality in product

5- Major; complete standstill in schedule, major injury or death, or failure in delivery/deadline/competition

Probability

1 - Occurrence unlikely, or no prior events

3 - Occurrence 0-10% of the time, maybe 1-2 prior events

5- Occurrence >10% of the time, a regular prior occurrence

Detection

1 - Detection tools in place; 0 prior incidents of missed detection

3 - Some detection tools in place; 1-2 prior instances of missed detection

5 - No detection tools in place; Several prior instances of missed detection

A team may want to introduce more levels, modify the numbers, etc; this is fine. Every company will have a slightly different numbering schema, and it’s only important to capture the levels and understand what each one means. Arguing about whether something is a 3 or a 5 is a part of the process, so don’t be distressed when the guy on the team who is Never Wrong™️ truly believes something is a 5. Maybe just give that one to him and be more conservative, eh? Get out of the weeds and back on topic.

Risk Priority Number

RPN is the multiplication of the 3 numbers derived above. This is used to characterize how “big” of a risk something is and how to choose what to act on first or what is most important.

It should be noted that the RPN approach was just (in 2019) retired in automotive by the Automotive Industry Action Group (AIAG) in favor of a simpler, combined standard with Verband der Automobilindustrie (VDA), but the core of the concept is largely the same. This has been replaced with Action Priority of High, Medium, and Low instead of doing math directly. However, RPN is still standard nearly everywhere else; I personally still use RPN in my role in driving simulator design, and used it when I was in nuclear power. Additionally, IEC 60812 still recommends RPN.

Current Controls

The current controls field is used to describe the ways the team currently deals with the risk associated with this failure mode. These should describe processes, checks, verification methods, and more to reduce risk of this failure occurring. Some examples:

The (Failure Mode) loss of funding is (Caused) by the team failing to submit budget paperwork and the (Effect) is they can’t buy new dampers this year. (Currently) the team has internal deadlines to complete the budget 1 week before it is due to the university, and examines the budget in monthly review meetings with several team members.

The (Failure Mode) crash in brakes test is (Caused) by incorrect braking calculations and the (Effect) is car broke :(. (Currently) the team has no controls to monitor this because only one person knows how the brakes calculator spreadsheet actually works and they graduated 6 years ago.

Recommended Actions

As the name suggests, Recommended Actions are ideas for new controls to implement to reduce the risk of this line item. Usually these involve changing or adding processes, reevaluating a design, seeking outside help, or more. Recommended actions are intended to reduce the values of Probability, Severity, Detection of this risk.

Decision

Decisions are derived from RPN, typically as a threshold value. The following decisions are available:

  • Avoid

  • Mitigate

  • Accept

Avoidance is for the highest RPN, and requires Recommended Actions to be implemented to reduce the risk.

Similarly, Mitigation is for mid level RPNs and suggests Recommended Actions to be implemented.

Finally, Acceptance is for risks that have a low RPN that may need intervention with Recommended Actions but are not of higher priority.

The reader should note that Acceptance is the target state for most items, and that means Accepting that the risk is still present, but there are enough Current Controls present that further action is not required to reduce this risk. If someone ever tries to tell you that they’ve completely eliminated a risk, tell them that I have a bridge in Brooklyn to sell them.

What next?

As stated above, FMEA are living documents that are revisited and updated at regular intervals. Recommended actions, when implemented, are moved into the Current Controls column and then the RPN is reduced, thus changing the decision of the row. In theory, this exercise is repeated until acceptable thresholds are set for each risk, whether that’s an RPN value at some level or lower, all items marked as Accept, or some other acceptable metric.

Okay, but why?

FMEA is a standard practice in industry that is not only a valuable skill to have on a resume, but also a valuable tool to make the car do good more often and go fast without hurting anyone (probably).

Characterizing failures can help students identify shortcomings in their systems, especially for complex and/or poorly-defined systems that always seem to need-fixing at the last minute. They can offer guidance on where to look first when failures occur and reduce downtime when the FMEA is matured. As the program cycles through and new students are added, it can help them get up to speed on how the program “ticks” and how processes like this make the program and the vehicle successful.

Regarding safety, FMEA can capture the dangerous things that these cars may do when put in an unsafe state, and can help the team verify they meet and exceed the safety regulations and requirements. Not only may that net a better Design score (the rubric literally asks if the team has exceeded safety standards of the rulebook in the Cockpit, Controls, and Driver Safety section), but it can also be used to communicate to folks who don’t have such an intimate knowledge of the system but are remarkably bothered by how “unsafe” it is such as the University, document judges, and your local SCCA-er.

And, it should be noted that FMEA is required for ETC in the IC class, and used to be required for EV documentation. Who knows, maybe it will come back? (Please volunteer for documentation reviews so we can mature our verification processes, thanks <3).

A practical example

The Pittsburgh Shootout publishes their FMEA for site safety. Students can review it here:

https://docs.google.com/spreadsheets/d/1dAgxpR94R8mhTuMNykopEIkIxLRzPd8csvb159GR-Og

Previous
Previous

Conceptual and Objective Design in FSAE

Next
Next

Guide To 2025 FSAE Frame Rule Changes