Output Compression for IC Fault Detection Using Compressive Sensing

Citation

Permanent link
http://nrs.harvard.edu/urn-3:HUL.InstRepos:10000990

Terms of Use
This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAA

Share Your Story
The Harvard community has made this article openly available. Please share how this access benefits you. Submit a story.

Accessibility
Output Compression for IC Fault Detection Using Compressive Sensing

Stephen J. Tarsa and H.T. Kung
Harvard University, Cambridge, MA

Abstract—The process of detecting logical faults in integrated circuits (ICs) due to manufacturing variations is bottlenecked by the I/O cost of scanning in test vectors and offloading test results. Traditionally, the output bottleneck is alleviated by reducing the number of bits in output responses using XOR networks, or computing signatures from the responses of multiple tests. However, these many-to-one computations reduce test time at the cost of higher detection failure rates, and lower test granularity. In this paper, we propose an output compaction approach that uses compressive sensing to exploit the redundancy of correlated outputs from closely related tests, and of correlated faulty responses across many circuits. Compressive sensing's simple encoding method makes our approach attractive because it can be implemented on-chip using only a small number of accumulators. Through simulation, we show that our method can reduce the output I/O bottleneck without increasing failure rates, and can reconstruct higher granularity results off-chip than current compaction approaches.

I. INTRODUCTION

Detecting faults in integrated circuits (ICs) that arise due to manufacturing variations is a notoriously difficult and costly problem. Colloquially, large chip design projects employ far more engineers working on IC verification than initial circuit design. In the case of military-grade ICs, reliability and security demands are a significant barrier for manufacturers [1]: in the past, the cost of testing MIL-SPEC ICs has led, in part, to some large companies leaving the marketplace and focusing solely on consumer electronics [2]. Going forward, battlefield dependence on devices driven by increasingly complex ICs means that fast accurate fault detection is becoming even more difficult and critical.

In practice, the logical-fault detection process proceeds by exercising the logical states of newly manufactured ICs using a sequence of test vectors, then verifying test responses against simulated fault-free responses. The steps of this process require first identifying a minimal set of test vectors whose application will reveal a near-total percentage of possible faults [3] [4] [5]. Second, the desired tests are applied by shifting test stimuli into the circuit using an industry standard (JTAG) interface [6]. And third, test responses must be shifted out of the circuit for examination. When faults are detected, the process may be repeated with a second set of more expressive test vectors to collect additional information on circuit behavior. Since the number of states taken on by VLSI circuits is huge, the volumes of input test data and output response data make testing I/O a bottleneck to the manufacturing process.

In addition to past efforts to reduce scan-in time for test vectors (e.g. [7], [8], and [9]), researchers have also developed methods for compacting test-responses on-chip to reduce output. Lossless compression circuitry is generally considered too costly, so lossy compaction methods are preferred. While current compaction methods improve test time, they lead to reduced test response granularity and increased false negative detection rates. Such detection failure is termed aliasing, since faulty test responses incorrectly appear fault-free due to the compaction process.

The most widely applied compaction method is space compaction, which uses XOR networks to reduce many bits from a single test response down to a few bits to be used for detection [10] [11] [12]. XOR networks are easy to implement, but can have high aliasing rates, since any even number of faulty response bits will lead to an erroneously fault-free detection bit when XORed together. Furthermore, space compaction detects faults to test vector granularity, and cannot distinguish between multiple faults expressed in the same test; this behavior can occur unexpectedly, or by design in order to reduce total test time.

The second compaction method used is time compaction, in which a signature is computed from multiple sequential test responses using a Multiple Input Signature Register (MISR). MISRs can yield larger reductions in output with fewer false negatives than XOR networks, but they also severely reduce detection granularity. With this method, faults are detected to one-in-many tests based on a single signature, and little diagnostic information is made available to designers. MISRs are also intolerant to test responses with unknown ‘X’ values, which arise from uninitialized flip-flop states. Additional circuitry must typically be included to mask off these ‘X’ values, increasing hardware cost [13].

We instead propose to apply the techniques of compressive sensing (CS) [14] to implement lossless output compression with only simple on-chip circuitry. With CS, we compress response data using a small number of accumulators that collect linear combinations over batches of test responses. The resulting reduced-dimension measurement vectors are then shifted out of the circuit, and results are reconstructed off-chip using a sparse signal recovery algorithm. This has several advantages over current methods:

1) Lossless compression of test responses is possible with simple on-chip circuitry
2) Successful signal reconstruction yields pin-level data off-chip, a result that is impossible with current lossy compaction methods
3) Aliasing rates are lower than both XOR networks, and MISRs that compact small numbers of test responses for highest granularity results.

We first formulate the fault detection problem in the compressive sensing framework. Since CS requires that signals be expressible with a small number of non-zero components (i.e. sparse), we present a method for finding bases that sparsely express faults. Our technique works by ordering input test vectors to promote the locality of correlated test responses, and exploiting the resulting correlations between successive tests using the Karhunen-Loève transform (KLT). We then show how faulty behaviors decoded from previously examined circuits can also be encoded in this basis, to exploit redundancy among circuits. We compare our method to both time and space compaction to show gains in compression rate, output granularity, and false negative rate. Finally, we simulate the testing process to demonstrate performance.

II. COMPRESSIVE SENSING OUTPUT COMPRESSION FORMULATION

Compressive sensing states that sparse signals encoded, or measured, via linear combination can later be reconstructed using a small number of these measurements, provided the measurement coefficients satisfy a Restricted Isometry Property (RIP). Reconstruction is possible using algorithms for $\ell_1$ minimization by linear programming [15], or basis pursuit [16] [17]. Furthermore, any basis for sparse representation can be used for decoding, provided the number of measurements is approximately proportional to the number of non-zero components required to represent the signal in that basis. Sufficient measurement matrices satisfying RIP include those with normally distributed or Bernoulli random entries, and those whose rows are chosen from the Fourier basis functions [18].

Using these facts, we can express test response compression for fault detection in terms of CS’s encoding and decoding steps. Defining $X_t$ to be a batch of $n$ test responses collected at time $t$, we write the encoding process as:

$$y_t = \Phi X_t$$

where $\Phi$ is an $m \times n$ ($m \ll n$) matrix with randomly distributed entries that have been predetermined prior to test time. $m$ is chosen to be approximately proportional to $k$, the number of non-zero values expected in the decoded signal vector. When $\Phi$ contains Bernoulli $\{0, 1\}$ entries, $m \approx 4k$.

Recovering test outputs off chip requires solving the following minimization problem:

$$\min ||X^o_t - X_t||_1 \text{ s.t. } y^o_t - y_t = \Phi \Psi_t (X^o_t - X_t)$$

where $X^o_t$ is a vector of simulated gold circuit outputs, $y^o_t$ the corresponding simulated measurements, and $\Psi_t$ is a sparsifying basis for $(X^o_t - X_t)$. Since a single fault may cause the values of many test outputs to deviate from the gold circuit, we are interested in finding an appropriate $\Psi_t$ so that faults are represented with only a few components.

III. SPARSIFYING BASES FOR TEST OUTPUTS

A. Exploiting Correlations Between Related Tests

Consider batches of test outputs collected at times $t = 1, 2, ..., T$. The KLT basis is spanned by the eigenvectors of their autocorrelation matrix: $A = \sum_{t=1}^T (X_g - X_t)(X_g - X_t)'$ [19]. When circuit outputs in multiple batches are correlated, the KLT is a sparsifying basis that represents these correlated outputs with common components. For example, at the extreme when all $X_t$’s are linearly dependent, then they can be represented by a single component in the KLT basis.

Observe Figure 1(a), in which input test vectors for one of our benchmark circuits are placed in an alignment matrix, according to their Gray Code ordering. Based on the expectation that two tests will share many of the same sensitized paths when their test vectors only differ by a few bits, then the rows of the corresponding output response matrix will show correlation. This will hold when faults occur in gates along those shared paths, or when the paths are fault free, as Figure 1(b) shows for our test circuit.

We therefore take advantage of these correlations using the KLT by arranging test vectors in an $n$-row alignment matrix, and applying tests in batches according to the columns of the matrix. As batches of $m$ measurements are offloaded and test responses are decoded, we use the trailing $T$ column vectors to calculate the KLT basis. The result, $\Psi_t$, is used to sparsely represent the next batch, $X_t$ at time $t$.

The alignment parameters $T$ and $n$ are circuit specific – if $T$ is too large, then $\Psi_t$ will likely do a poor job of sparsely representing $X_t$ because it may include batches with uncorrelated test results. In general, $\Psi_t$ is a sparsifying basis for $X_t$ whenever $X_t$ can be written as a small linear combination of the preceding batches’ KLT components. In practice, simulations of the fault-free gold circuit outputs can be used to determine the appropriate values for $T$, $n$, and $m$.

B. Exploiting Correlations Among Circuits

Unavoidably, test outputs will arise that are not sparsely represented based on the behavior of prior tests. This can occur when circuit architecture makes finding an alignment matrix that is successful for all batches difficult, or when unexpected faulty responses occur that deviate substantially from prior test outputs. In this case, off-chip decoding will fail due to insufficient measurements. However, once these test responses are retrieved, they can be incorporated into future $\Psi_t$’s by adding the observed vector to the KLT: $A = \sum_{c \in C} \sum_{t=1}^T (X_g - X^c_t)(X_g - X^c_t)'$ where $C$ denotes an index set of previously examined circuits. Therefore, when decoding fails, we offload the current vector at full rate, paying a one-time penalty in order to sparsely represent the faulty responses in future tests.

IV. IC TESTING PROCESS FLOW

In order to implement the sparsification procedures of Section III, Figure 2 shows an IC testing flow that includes two phases: training and circuit testing. The training stage is used to capture common faulty test responses in initial $\Psi_t$
Fig. 1. (a) In order to promote locality of correlated test output values among adjacent columns, input vectors are ordered in increasing order by their integer representation across rows in a alignment matrix. They are then applied to the circuit in batches by column, denoted \( I_1 \). (b) For our benchmark circuit, rows of output responses do indeed show high correlation that can be exploited for compression.

V. PERFORMANCE ANALYSIS & COMPARISON

We compare our method to space and time compaction by analyzing output reduction rates, aliasing rates, off-chip response granularity, and hardware cost.

A. CS-Based Output Compression

For a circuit with \( L_{\text{out}} \) output pins, the output compression factor using our method is given by:

\[
1 - m \log_2 (\frac{1}{n} 2^{L_{\text{out}}}) \quad \frac{n L_{\text{out}}}{n L_{\text{out}}}
\]

Raw output requires \( n L_{\text{out}} \) bits to offload \( n \) outputs. In contrast, our compressive sensing method requires only \( m \) measurements for a batch of \( n \) results. The maximum value in each accumulator is \( \frac{1}{2} n 2^{L_{\text{out}}} \) since half of the Bernoulli random coefficients will be 1’s on average, and the maximum value of each test output is \( 2^{L_{\text{out}}} \).

Aliasing occurs when the measurement vector of a faulty batch of test responses matches that of the simulated fault-free responses. This probability is given by:

\[
\Pr(y^q_t = y_t | X^q_t \neq X_t) = \frac{2^{(n - m) L_{\text{out}}}}{2^{n L_{\text{out}}} - 1} \quad (4)
\]

This expression is derived as follows: with \( L_{\text{out}} \) bits-per-response, all but one of the \( 2^{n L_{\text{out}}} \) possible batch vectors are faulty. Since the system is under-determined by \( n - m \) degrees of freedom, \( 2^{(n - m) L_{\text{out}}} \) vectors satisfy the measurement vector \( y_t^q \), and all but one of these is faulty.

The granularity of offloaded measurements is always to the pin-level. Either the vector is sparsely represented by \( m \) measurements, in which case it is successfully decoded, or it is offloaded at full rate upon decoding failure.

B. XOR Network Compaction

We next compare to space compaction using the X-Compact [12] method for XOR networks. Test responses with \( L_{\text{out}} \) bits are reduced to \( L_{\text{sig}} \) detection bits by XORing together \( L_{\text{sig}} \) distinct combinations of response bits. The output reduction rate is therefore \( 1 - (L_{\text{sig}} / L_{\text{out}}) \).

Whenever an even number of faulty bits are XORed together, the result will be erroneously fault-free: aliasing occurs when this happens at all \( L_{\text{sig}} \) outputs. The overall aliasing rate is therefore driven directly by the output reduction rate, since a greater number of unique XOR combinations for a fixed \( L_{\text{out}} \) reduces the probability that all detection bits will be erroneous.

We quantify this aliasing rate by simulating the response of random X-Compact networks to faults in the benchmark...
circuit shown in Figure 5. The results are shown in Figure 3, and we see that aliasing increases sharply when fewer distinct XOR combinations are used. For this circuit, the best achievable compaction with an aliasing rate below 1% is only a 35% reduction. While an XOR network can be engineered for lower aliasing in response to known faults, unanticipated faults will cause unpredictable aliasing, potentially increasing the overall rate.

At all output rates, detection occurs to the test-vector level, since the $L_{\text{sig}}$ bits only provide redundancy against aliasing. This method cannot distinguish between multiple faults exercised by a single test.

C. MISR Compaction

We next compare to MISR time compaction, assuming no unknown 'X' values. Combining $s$ test responses with $L_{\text{out}}$ bits into a single signature of $L_{\text{sig}}$ bits leads to an output reduction of:

$$1 - \frac{L_{\text{sig}}}{s * L_{\text{out}}}$$

(5)

Adopting the model in [11], the aliasing rate can be calculated by the probability of hash collision:

$$\Pr(\text{Collision}) = \frac{2^{sL_{\text{out}} - L_{\text{sig}} - 1}}{2^{sL_{\text{out}} - 1}}$$

(6)

For a fixed $L_{\text{out}}$, MISRs achieve best output reduction and lowest aliasing rate when $s$ is large. However, this reduces fault detection granularity to 1-in-$s$ tests. In Figure 3, we show what happens when we set $s = 2$, in order to maximize the granularity of off-chip results. In this case, 60% compaction can be achieved before aliasing rises above 1%.

In contrast, our CS-based method similarly achieves best compression and aliasing performance when $n$ is large, however it suffers no such loss in granularity. Figure 3 shows that the aliasing rate decreases significantly when we increase the batch size to 32 and offload the maximum number of measurements given a fixed output budget. Achieving these low aliasing rates without a granularity penalty is unique to our CS-based method.

D. Required Circuitry

We next examine the hardware cost of our method, using flip flop count as a comparison metric. Our on-chip architecture is illustrated in Figure 4(b). Encoding requires $m$ accumulators with a maximum value of $\log_2(\frac{1}{2}n2^{L_{\text{out}}})$. We use $m$ seeded random $\{0, 1\}$ number generators to store each row of measurement coefficients. Since $m < n$, $n - 1$ unique initialization states are required. Flip flop count is then:

$$m \left[ \log_2(n - 1) + \log_2\left(\frac{1}{2}n2^{L_{\text{out}}}ight) \right]$$

(7)

Both X-Compact and MISR methods require only $L_{\text{sig}}$ flip flops, meaning that our method has a higher hardware cost, especially when $m$ is large. However, it is possible to reduce the number of accumulators in exchange for a higher decoding failure rate in order to meet cost or area budgets.

VI. Evaluation

To evaluate our methodology, we simulate the fault detection process using a benchmark circuit from the ISCAS-85 suite [20], shown in Figure 5. We simulate both the training and testing phases described in Section IV, testing 10,000 ICs. IC's have a 10% probability of a fault occurring, and for these circuits, the number of faults present is normally distributed over the range 1 to 6. During the training phase, we set aside 5% of faults as “unanticipated” behaviors during testing.

For our CS method, we tune parameters $n = 512$ and $T = 16$. Given that at most 6 faults should occur in any circuit, we test with $m = 24$ and $m = 16$. This second parameter value has a lower hardware cost, but a higher probability of decoding failure when many faults occur. To compare with time compaction, we use two signature methods: MISR-2, which takes in 2 8-bit test responses and outputs a single 6 bit signature, and MISR-4, which takes 4 8 bit test responses and outputs a single 7 bit signature. The first is chosen for highest granularity results, and the second for better compaction and aliasing performance. For both methods, the characteristic polynomial is defined by a randomly generated bit stream, fixed for each circuit. To compare with space compaction, we randomly generate X-Compact networks that output 5 bits, the best reduction for sub 1% aliasing based on Section III B.

The results are shown in Table I. Our CS method delivers output reduction rates of 90.68% and 93.64%, respectively. Though taking only $m = 16$ measurements increases the number of decoding failures by 7x, the total output is reduced since fewer measurements are taken for compressible faults. The CS method achieves an 8% output reduction improvement over MISR-4, and a 56% improvement over the XOR network approach. Additionally, we observe no aliasing with our method, while all compaction methods have non-zero aliasing.

The gains of our CS implementation come at the cost of a 45x increase in the number of flip flops over MISR-4, the second best compaction method. This may be a prohibitive cost.
Figure 4: The DISTROY front-end applies \( N \) randomly chosen test vectors to a CUT, measures corresponding leakage ... 20\ N\ NAND gates as depicted in Figure 5. Lastly, we use five double-c17 blocks to produce double-c17x5 shown in Figure 6.

![Diagram](image_url)

VII. FUTURE WORK

A major avenue for future work will be extending our evaluation to additional circuits with varying size, gate count, path length, and fan-in properties. Beyond the examples in the ISCAS-85 and ITC-99 benchmark suites, the work of [21] has developed methods for Monte Carlo simulation by analyzing prototype circuits for the above properties, and randomly generating similar circuits. Of particular interest is the analysis of our method under test sequences that exercise the internal states of flip flops in sequential circuits. With a larger number of tests to run, and sets of output results that toggle only internal memory states, we expect the amount of output correlation to increase, leading to better performance for our method.

A second major avenue of work is to investigate encoding pin outputs using analog circuitry. Analog circuitry exists for computing Fourier transforms, and as noted in Section II, it may be possible to piggyback off these components to encode signals at minor hardware cost. In addition, analog domain encoding would capture more information, including real-valued voltage readings that are useful for detailed circuit analysis.

VIII. CONCLUSION

In this paper, we presented a method for lossless output compression for the fault detection process that uses simple IC-independent test circuitry. We introduced new techniques for calculating bases that represent faults with a few non-zero coefficients. These techniques capture correlations between related fault tests and across circuits in the testing pipeline. Using the random linear encoding process of compressive sensing on-chip, and sparse signal recovery off-chip, we showed that we can reduce the size of test outputs without sacrificing granularity or increasing detection failure rates. This improves significantly upon current industry-standard methods of compaction in these respects. As a result, our CS-based compression method yields orders of magnitude more data to designers for fault isolation. Given the criticality of fault detection for MIL-SPEC ICs, where strict detection requirements are enforced, these gains have the potential to greatly improve the efficacy of testing and verification for designers and manufacturers.

ACKNOWLEDGMENTS

This material is based on research sponsored by Air Force Research Laboratory under agreement numbers FA8750-10-2-0115 and FA8750-10-2-0180. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Air Force Research Laboratory or the U.S. Government. The authors would like to thank the Office of the Secretary of

<table>
<thead>
<tr>
<th>Method</th>
<th>Flip Flop Count</th>
<th>Output Reduction Rate</th>
<th>Aliasing Rate</th>
<th>Granularity</th>
<th>CS Decoding Failures</th>
</tr>
</thead>
<tbody>
<tr>
<td>CS ( m=16 )</td>
<td>320</td>
<td>93.69%</td>
<td>0%</td>
<td>pin-level</td>
<td>77.16%</td>
</tr>
<tr>
<td>CS ( m=24 )</td>
<td>483</td>
<td>90.68%</td>
<td>0%</td>
<td>pin-level</td>
<td>10.146</td>
</tr>
<tr>
<td>MISR-4</td>
<td>7</td>
<td>78.12%</td>
<td>5%</td>
<td>1-in-4 tests</td>
<td>10%</td>
</tr>
<tr>
<td>MISR-2</td>
<td>6</td>
<td>62.50%</td>
<td>6%</td>
<td>1-in-2 tests</td>
<td>20%</td>
</tr>
<tr>
<td>XOR</td>
<td>5</td>
<td>37.50%</td>
<td>1%</td>
<td>1-in-1 tests</td>
<td>30%</td>
</tr>
</tbody>
</table>

TABLE I
Defense (OSD/ASD(R&E)/RD/IS&CS) for their guidance and support of this research.

REFERENCES


