When we design for reliability, we usually focus on high-quality components operated conservatively and well within their absolute maximum specs. We expect such designs to become technically obsolete long before they suffer a significant percentage of failures.
But that isn’t all there is to reliability. To be truly reliable, a product needs not only to have a long service life, but also to work the way it’s intended to, right from the start. Optimized printed-circuit board (PCB) design and component mounting can be the difference between the prototype operating correctly the first time and hours of frustrating troubleshooting and re-design.
There are four broad areas of reliability related to PCB layout and component mounting: electrical, mechanical, thermal, and thermomechanical. The following sections explain the basic analysis required and some design best practices.
Electrical reliability requires clean power and low-impedance grounding. Traces must be laid out to minimize crosstalk and clock skew. Appropriate shielding is needed to block interference and prevent logic glitches.
Getting power and grounding “right” requires an analysis of the parasitic interaction between the system board and IC packaging. This analysis can then be used to simulate power-bounce and ground-bounce under worst-case loading conditions.
Software tools can perform this analysis, some working with or within the PCB layout software. They can (for example) reveal where narrow traces, cutouts in the power and ground planes, or inadequate current-return paths might cause inductive noise spikes. These tools can optimize filter-capacitor positioning to further clean up the power and ground planes.
Another way to ensure clean signal delivery is to give each single-ended signal line its own tightly parallel return-current path. This layout mimics a transmission line (with its more-or-less constant impedance), reducing mismatches that cause signal reflection and I/O disturbances.
Slots or breaks in the power and ground planes above and below the signal lines can corrupt current-return paths. Some design tools might not properly account for slots and breaks, so it’s important to run simple test cases to confirm that the software correctly detects and quantifies such problems.
For differential signals, the positive and negative lines must have the same delay. A sufficiently large delay (“sufficiently” varying with the application, of course) might cause loss of data, dropped bits, because the differential voltage is too low to be readable. This results in packets having to be resent, slowing data transfers.
Power and ground bounce can cause signal distortion on differential lines, resulting in “eye diagram” collapse and lost data bits(Fig. 1).
1. Serializer-deserializer (SERDES) I/O uses differential signaling. Noise can reduce voltage separation of the differential signal, making the data more difficult to detect. An “eye diagram” can be used to monitor the differential margin and the noise, which might cause jitter and close the detection window. (courtesy of LeCroy Corp.)
Mechanical reliability requires accounting for mechanical stresses. For example, simply inserting a board or tightening it down can severely stress the solder joints that attach a package to the PCB. Figure 2 shows pad cratering, where a solder joint stressed by PCB bending caused polymer cracking and trace lifting. These cracks can propagate through the traces, creating intermittent contacts and electrical opens. Other types of failures such as leakage also are possible.
2. Pad cratering occurs when twisting/bending or thermomechanical mismatches create stresses sufficiently large to lift the solder pad or crack the underlying material.1
Another stress-related concern is shock damage. When a device such as a notebook computer lands on a hard surface, complex vibrations pass, tsunami-like, through the PCB, stressing the components mounted to it.
There is no way to anticipate whether a particular component will survive severe stress, or be damaged or have its connections broken. Determining the likelihood of damage requires dynamically modeling the system, or attaching strain gauges and performing a real-world “drop test.”
There are standardized drop tests that evaluate components soldered to specific boards. The results of such tests apply only to the board used. A component that survives 500 drops on a standard board won’t necessarily withstand 500 drops onyourboard. Such testing is useful, however, if the stresses measured under defined conditions are compared with those generated when drop-testing the board for your particular product.
Stiffening methods (such as plates and frames) can be used to more evenly distribute shock force, reducing the stress on solder joints.
Some designers improve drop-test performance by applying underfill (UF) materials that effectively “glue” the package to the PCB. UF materials with appropriate expansion values, glass transition temperatures, and modulus properties must be chosen to avoid accelerating solder ball fatigue during thermal cycling or power cycling.
Dynamic modeling is a complex process, so most developers wait until systems have been built, then test the physical hardware. If the failure rate is excessive, redesigns might be required that would not have been necessary had the system been initially computer-modeled.
Thermal reliability accounts for how the PCB acts like a heatsink soldered to the IC package. Ideally, we’d like the PCB to conduct away as much heat as possible. The “cooler” a device runs, the fewer problems there are with thermally accelerated failure mechanisms such as electromigration, device parametric performance and material degradation such as embrittlement.
In practice, PCB features such as slots, thermal land pads, thermal spreading planes, and core vias that conduct heat into power and ground planes influence the board’s effectiveness in “draining” heat from devices. System-level design software should be used to assure adequate thermal performance.
Unfortunately, some software doesn’t analyze thermally important features such as slots in power or ground planes or the impact of vias. The designer is obliged to include their effects through other means, such as lumped thermal conductivities for via blocks, or effective thermal conductivities for breaks in power or ground planes.
The heat conducted by the board is ultimately removed by convection. If convection is important in keeping components cool, the designer might want to consider the effect of dust accumulation in reducing convection efficiency, especially if it’s unlikely the board will ever be cleaned.
Thermomechanical reliability requires accounting for mechanical stresses created by temperature cycling. Failures can range from ICs popping out of their sockets to cracked and lifted traces.
Figure 3 shows a solder joint damaged by thermal fatigue. Because device packages often expand and contract at a rate different from the PCB, wide temperature swings cause significant mechanical stress.
3. This is a cross-sectional photomicrograph of a fatigue failed solder joint. The fatigue crack path is in the solder ball along a thin layer of solder connecting to the die.
As solder isn’t particularly flexible, it will deform, fatigue, and eventually crack—sometimes to the point where the joint fails completely. Minimizing thermally induced solder fatigue requires a number of material and design decisions.
The PCB’s solder lands must be appropriately sized for the package’s leads. When the land pattern gives significantly more area to the board or package, thermal stresses are unevenly distributed, with the smallest lands subjected to the greatest stress.
Package data sheets often recommend specific PCB land patterns. These recommendations should be followed(Fig. 4).
4. Dimensions “A,” solder mask opening, and “B,” land diameter, are often called out for PCB lands on traces for an area-array solder ball.
Enough solder must be used to provide sufficient standoff from the board. Package documentation usually suggests a stencil pattern and thickness that ensure the appropriate amount of solder is deposited.
If the package has a lead-free finish or lead-free solder balls, many manufacturers recommend using lead-free solder pastes for best performance.
PCB thickness can reduce or worsen the effects of thermal cycling. Thicker boards are less flexible, creating greater stresses in solder joints as the temperature changes. Thinner boards flex more easily, letting the system relax and reducing the solder joint stress. You can imagine the limiting case when the PCB is infinitely thin—there would be no stress on the solder joint during a temperature cycle. So, when judging component lifetime from published data, it’s important to know the thickness of the PCB used in the experiments, not just the relative expansion between the experimental and system PCB.
Component placement on opposite sides of a PCB can increase or decrease its flexibility. Identical components placed on opposite sides of the PCB result in localized stiffening that makes the board appear infinitely stiff, reducing thermal cycling lifetime.
With careful system-level design, electrical, mechanical, and thermal reliability can be significantly improved, making for greater customer satisfaction and longer product life. The required analysis should be part of the initial design and not be put off until prototyping.
1. John Wemekamp, “Preparing for Lead-Free Electronics,” Military Embedded Systems, October 27, 2009.