Project: Random Graphs and Applications to Codes and Cryptography
Student Researchers: Laura Elena Serban, Evdokia Nikolova, Mihaela Enachescu
Advisor: Michael Mitzenmacher
Institution: Harvard University



1. Introduction

Our project is motivated by the necessity of reliable transmission of information over channels that introduce errors. A code is a method to recover the original message in case errors were introduced during transmission. It does it by adding redundant information to the original message, resulting in a mapping from the message to an encoding. The theoretical origins of coding theory are marked by Shannon’s 1948 paper. Shannon proved the existence of families of error-correcting codes that can perform with arbitrarily small probability of error, up to the rate given by the capacity of the channel [1]. He also showed the converse – that we cannot find codes that do better than the channel capacity. This provided an initial upper bound for the codes we can hope to find.

Hence, the initial research focus was on discovering codes that get arbitrarily close to the capacity of the channel, and which have polynomial time encoding and decoding algorithms. The most popular example is the family of Reed-Solomon (RS) codes that provide optimal information recovery for sufficiently large block lengths, which were discovered and studied in the 1960s. However, for practical values of the block length, their best encoding and decoding algorithms are quadratic in time. Although algorithms with lower asymptotic bounds were discovered and applied to RS codes, the overhead of the constant factor hidden by the big-O notation makes them less efficient for most practical block lengths [2].

The necessity of finding codes with more efficient encoding and decoding schemes has prompted a great deal of attention on finding good low-density parity-check (LDPC) codes. These were first introduced by Gallager [2] in 1963, forgotten, and then rediscovered three decades later. These codes have very efficient encoding and decoding algorithms (due to the low-density requirement) for a variety of channels. A number of techniques have been developed for constructing capacity-approaching LDPC codes of very large, and hence not so practical block length [3], [7], [8]. The general idea is to analyze the performance of ensembles of codes, and find ones with good average performance. Then, based on concentration results, which were proven to hold for sufficiently large block lengths [8], random codes are ensured to have good performance as well. This is due to the fact that the performance is concentrated around the average performance, and hence a good code of large enough length can be generated with high probability by randomly selecting any code in the ensemble.

However, the analysis that holds for large codes is not useful for small block length codes. In fact, empirical results show that for small block lengths (under 10,000 message bits), the variations in performance can be quite significant [8]. Nevertheless, short codes are very important for applications that require the exchange of short and frequent messages, which need to be encoded through such codes to avoid unnecessary overheads and delays. The bloom of such applications in the context of digital and satellite communication, some of which need to transmit data through binary symmetric channels, underscores the importance of finding efficiently encodable and decodable small codes over such channels.

In our work we apply a novel method to search for good low-density parity-check codes of small block lengths for the binary symmetric channel. In addition to the implementation and testing of a newly designed search method for small codes, our project includes a valuable theoretical component that summarizes previous research in the area of LDPC codes. While we started working on the project knowing almost nothing about coding theory, we now feel we have gained a much deeper understanding of the field. The theoretical component of this project has been particularly valuable as it provided us with new directions of research.

This report is structured as follows. We begin by presenting our theoretical research of the field. Then we briefly outline previous work undertaken in the area of small block length codes in the third chapter. Next, we describe our approach and design, as well as our results. At the end we conclude with suggestions for further work.

2. Low-Density Parity-Check Codes

The emphasis in the earlier years of coding theory was on codes whose basic structure was algebraic. A different approach, more difficult to evaluate at the time, was the use of random codes. Shannon's proof of the capacity of a noisy channel, as well as many similar fundamental results in coding theory were derived by analyzing the average properties of a large class of codes. Low-density parity-check codes are based on the intuition that the best of several codes chosen at random from an ensemble of codes will have properties at least as good as the average ones. The crucial innovation was Gallager's introduction of low complexity iterative decoding algorithms, which can decode very close to the channel capacity. LDPC codes can be used for a variety of channels. Despite their very attractive properties, LDPC codes were forgotten for three decades, and rediscovered recently. One practical issue, that remained open for a period of time, was the encoding costs. Currently, however, several encoding techniques have been proposed for LDPC codes that lead to linear or nearly linear encoding [3], [10].

The binary-symmetric channel is a channel for which both the input and the output are sequences of the binary digits 0 and 1. The channel is memoryless, meaning that for every input the probability that the output digit is switched is p, and the probability that the output is the same is 1-p. The channel is entirely specified by this crossover probability.

The channel capacity is a measure of the minimum amount of redundant information necessary to retrieve the original message. It has been shown that most large codes have very good performance. In contrast, little is known on the behavior of small codes.

One of the more useful ways of describing an LDPC code (for decoding purposes especially) is to represent it using its parity-check matrix, defined so that it returns the zero vector when multiplied by any codeword in the code it specifies. For LDPC codes the parity check matrix also satisfies a certain sparsity condition, in that only a certain fraction of the entries are non-zero. This condition might be requiring the matrix to have a certain number of 1's per column or row. A matrix satisfying the given constraints is then selected at random by a process that we discuss in more detail below.

Example 1 [Parity-Check Matrix of a code of length 12]



As stated above, the code represented by the above matrix is formed of all codewords x satisfying Hx' = 0'. (*)

When describing and analyzing the different algorithms for encoding and decoding, it is generally useful to look at an LDPC code as a bipartite graph with the message nodes on the right representing columns in the parity-check matrix, and the check nodes on the left representing the rows. A check node implies a dependence relation on some of the message nodes (see Figure 1). The dependence can be extracted from the parity check matrix. Edges connect the check nodes to the message nodes that they depend on. In the above example we would have 12 variable nodes (one per column) and 6 check nodes (one per row). A check node represents the constraint given by the corresponding row of the matrix, namely that the sum of the variable nodes corresponding to non-zero entries is 0. In the bipartite graph we connect these variable nodes with the corresponding check node. The number of non-zero entries in the row is the number of connections of the corresponding check node. Because the matrix is sparse, with a linear fraction of non-zero entries, the resulting bipartite graph contains a number of edges, which is a linear (instead of a possibly quadratic) factor of the number of nodes n. This allows for linear time encoding and decoding algorithms, which, as we mentioned in the introduction, is a great practical advantage of the LDPC codes. Note that the condition of sparsity is an overall requirement, which allows for a high degree for a given node (corresponding to a high number of 1's on a given row/column in the parity-check matrix) as long as the other nodes have low degrees to make up for the high degrees of a few other nodes.

If the number of 1's is the same for each column and for each row, that is the degrees of all message nodes and of all check nodes are equal, then the code is regular. Regular codes were the focus of the first studies, by both Gallager, and later by MacKay and Neal [5] who rediscovered LDPC codes. However, the initial attraction to regular designs proved to be misleading. In fact, as shown by Luby, Mitzenmacher, Shokrollahi, and Spielman [3], [4], the best performing codes tend to be of irregular forms, where the degrees of nodes on each side of the graph can vary widely. The underlying degree distribution tends to have a great influence on the performance of the codes. For large enough block lengths, finding optimal degree sequences (specifies the fraction of edges connected to nodes of each degree on the right and as well as on the left) plays a major role in the search for good codes.

Once the degree sequence is specified, we can construct a random code with that degree sequence by pairing up the "edge slots" on the left with the "edge slots" on the right in a random manner. Each vertex would have a number of edge slots corresponding to its degree. For example, in figure 1, the first message node x1 has degree 3 since there are 3 edge slots associated with it; similarly x2 has degree 4 and c1 has degree 5.



The rate of an LDPC code is defined as the ratio of the number of check nodes and the number of message nodes in the bipartite graph representing the code. This is a measure of the efficiency/redundancy of the code.

Linear Encoding algorithm for LDPC codes

A major criticism of LDPC codes has been the apparent high encoding complexity. The most straightforward construction proposed for encoding an LDPC code takes quadratic time in the block length. We will present this construction, and give alternative ideas that reduce the encoding complexity down to linear time.

The basic quadratic algorithm is based on the manipulation of the m x n parity-check matrix of the code. We assume that the parity-check equations are all linear independent, so that the rate of the code (the fraction of non-redundant bits) is exactly n - m/n.

I. Preprocessing step: Using Gaussian elimination, transform the matrix H into lower-triangular form, and use the new matrix H' as the parity-check matrix for the encoding. Note that we can now divide the codeword into a systematic part (the first non-redundant n-m bits), and a parity part, that can be computed using the parity-check matrix from the previous bits.

II. Encoding step: i) Fill in the systematic part with the n-m desired information symbols. ii) Determine the m parity-check bits, noting that, because H' is in lower-triangular form, each of these bits depends only on the bits at lower index, and each bit can be computed by looking at a single row in the matrix H'.

After the O(n^3) operations required to bring the matrix into lower-triangular form, the time needed to do the actual encoding is proportional to the number of non-zero entries in H', since for each parity bit, which is computed by looking at a given row, we need to add all the previous bits corresponding to non-zero entries in the row. Since after doing Gaussian elimination, in general the matrix will no longer be sparse, there are O(n^3) additions to be performed.

If we can guarantee a sparsity condition on H' similar to that on H the above algorithm would be linear. However, simply forcing the parity-check matrix to have lower-triangular form would, in general, result in some loss of performance. Richardson and Urbanke developed in [10] greedy algorithms to transform a sparse matrix into an equivalent almost lower-triangular sparse matrix. They showed that for certain degree sequences these algorithms produce very good results giving rise with high probability to codes that allow transmission close to the capacity of the channel as well as linear encoding complexity.

To simplify our analysis, in this phase of our project we made the assumption that the values of the check nodes do not become corrupted during transmission. Thanks to this assumption, it suffices to simulate the encoding by setting each check node to be the XOR of its neighboring message nodes. In the more realistic scenario when check bits could switch their values during transmission, as well, we would need to employ a cascading code to first correct the information of the check nodes as in Luby, Mitzenmacher, Shokrollahi and Spielman [4].

Hard Decoding Algorithms for the binary symmetric channel

In his thesis, Gallager discussed several decoding algorithms that apply to the underlying bipartite graph representing a code. The algorithms work iteratively, and information is exchanged between nodes in the graph by passing messages along the edges connecting them. The messages represent an estimate of the value of the message bit on the left- hand side of the edge. In the first round, the message nodes simply send the values initially received on the channel. The check nodes respond with a message dependent on the messages received. The message nodes then combine these responses and their originally received value and compute a new message to send. This process continues, hopefully converging on the maximum-likelihood codeword for the received message.

The following algorithms were the original algorithms proposed and analyzed by Gallager [2].

Algorithm 1: For each message nodes, the neighboring check nodes send the XOR of the messages received from all their adjacent message nodes other than the receiver of the message. The message node continues to send the originally received bit unless all messages received from the adjacent check nodes (other than the receiver of the message) disagree with the original value. In the later case he "switches," and sends the value received rather than the original value.

Algorithm 2: Gallager observed that the above algorithm leads to better results when the message nodes switch their value sooner. In this revised algorithm, for each round there is an optimal threshold value, which gives the number of disagreeing message from other check nodes needed for the message node to "switch" the originally received value. The check nodes behave in the same way as in algorithm A.

Gallager shows that the hard-decision decoding algorithms above correct all but an arbitrarily small fraction of errors for regular codes with sufficient expansion. In [4] Luby, Mitzenmacher, Shokrollahi and Spielman present an alternative decoding algorithm that, if used in the last phase of decoding, guarantees successful termination with high probability. In this work we will refer to this algorithm as the "second decoding algorithm." We describe the algorithm used below:

Sequential decoding: if there is a message node that has more unsatisfied than satisfied check node neighbors, flip the value of that message node. Repeat until no such message remains.

The decoding process can run for a predetermined number of rounds, after which each message node can decide its most likely value based on the information it received from its neighbors. If the check nodes are satisfied, then a codeword has been successfully found; otherwise the decoding has failed. Alternatively, after each round, each message node can determine its most likely value and a check can be performed to see if a codeword has been found. If not, the process continues until the decoder decides to stop with a failure.

Most decoding algorithms of such kind generate messages based on extrinsic information, meaning the messages sent to a node should not depend on the messages received from that node. This property, which is important in the proof of the performance bounds for the decoding algorithms, requires the graphs to have no cycles of length less than the number of rounds needed for the results to converge. The proofs show that there exists some parameter depending on the degree sequence of the graphs, such that if the initial fraction of errors is below this parameter, then the fraction of incorrect messages passed at each round decreases exponentially with the number of rounds, under the independence assumption [4]. For cycle-free codes, the fraction of incorrect messages decreases to zero after a sufficiently large number of rounds. This shows that for large enough block lengths, there is a high probability that the decoding algorithm successfully converges for any code, as long as the fraction of errors is below the threshold limit.

Let p_i be the probability that a specific message node sends a check node an incorrect value at round i. Initially, p0 = p, the probability that a message bit has been corrected during transmission. Assuming that the graph does not contain small cycles, the neighborhood of every message node looks like a tree of sufficiently large depth. Gallager searched for the highest possible value p* of the initial probability of error, such that the sequence pi is monotonically decreasing, and hence converges to 0. More precisely, he proves that as the block length of the code and the girth of the graph grow large, the iterative decoding algorithm presented above works for all p < p*. Luby, Mitzenmacher, Shokrollahi and Spielman [4] show that Gallager's algorithm decodes successfully from a large fraction of errors for a randomly chosen graph with high probability, as long as the selected graph has good expansion properties. Note that the expansion properties require that the graph be large.

Gallager showed that the performance of regular codes of sufficiently large block lengths asymptotically approaches the channel capacity [2]. More recent analysis showed that carefully chosen irregular ensembles tend to decode with high probability at rates closer to the capacity of the channel than is the case for the regular ensembles [4]. The specific irregular structures that lead to the best LDPC ensembles depends on both the channel and the specific decoding algorithms used.

Further, we explain briefly the intuition behind the improved performance irregular codes. In a regular code, the number of check nodes corresponding to a given message node is constant, i.e. each message node is equally "protected". To increase protection we would need to have a large number of adjacent check nodes. Thus the message nodes would have a larger degree, and so would the check nodes. However, this would make the check nodes less reliable because they would depend on more message nodes to be received correctly. By contrast, irregular codes do not need to balance reliability and protection uniformly. In fact, with irregular codes it is possible to have a "wave effect", in which the nodes with the best protection are corrected first, and then their results are transmitted through check nodes to the less protected ones [4].

The very efficient decoding and encoding algorithms for LDPC codes, discussed above make them especially attractive for a number of practical applications, in particular for online applications that are especially sensitive to the encoding and decoding time complexities. These online applications, such as telnet, require the frequent and reliable exchange of messages of small length. These applications would greatly benefit from the LDPC coding scheme. This motivates our desire to find good short LDPC codes, which could then be used for the above applications.


3. Previous work on LDPC codes at short block length



In [6] and [7], Mao and Banihashemi propose a heuristic method that efficiently compares randomly generated codes of small block length and detects the better ones with high probability. The idea of the method is based on the intuition that small cycles interfere with the decoding process because the independence assumption is violated and the errors propagate faster than they can be corrected. The method compares the codes based on their girth (the girth of a node in a graph is defined as the size of the smallest cycle containing that node) distribution, which is a function of the cycles in the bipartite graph representing the code, and selects the best ones according to this function. The authors show that in practice their method performs well for the binary symmetric channel.

Prior to Mao's and Banihashemi's work very little progress was made in the direction of developing an efficient and close to accurate method of evaluating and selecting small block lengths codes. More recent work suggests that their method applies best to the binary symmetric channel and its corresponding algorithm, but it can also be used with some success for the erasure channel and its decoding algorithm.

4. Search for good LDPC codes at short block length

Our approach is also based on intuition that small cycles interfere with the decoding process. Nevertheless, our search method for good performing small codes differs in conception from that of Mao and Banihashemi.

As mentioned before, Luby, Mitzenmacher, Shokrollahi and Spielman [4] observe experimentally that the degree sequence of the left and right hand sides of the bipartite graph representing the code drastically influence its performance. In their experiments, they set the left-hand side degree to a constant, and find the optimal degree sequence for the message nodes by solving a linear program. The linear program simulates the decoding algorithm. More specifically, the constraints are linear in the right hand side degree sequence and force the probability of an erroneous value sent by a message node to decrease after each subsequent round of the decoding process.

Let the number of message nodes and check nodes be dl and dr respectively. We denote the degree sequences for the message nodes and check nodes by lambda = (lambda1, lambda2, ..., lambda_dl) and rho = (rho1, rho2, ..., rho_dr). Given a vector rho and the numbers dl and dr, we need to determine an optimal vector lambda. A random code is generated by randomly pairing the edge slots of the message nodes with the edge slots of the check nodes subject to the degree constraints of the nodes. Note that we must have the same number of edges on both sides.

Define rho(x) = sum over i of rho_i * x^(i-1) . Let p_i be the probability that a message node m sends a check node c an incorrect value at round i. Further define f(x) as in [6].



Then f^i(x) = p_i where f^i stands for the value of f(x) at the i-th iteration. For the intuition behind the iterative formula for f^i(x) refer to Luby, Mitzenmacher, Shokrollahi and Spielman [6]. Following upon Gallager's work, these authors show that a random code with a degree sequence determined such that f(x) converges to zero is decodable with high probability as long as the graph is large enough to have good expansion properties. Note also that expander graphs are sparse, and hence the probability of small cycles is low.

In practical terms "large enough" values are larger than the ones we would like to consider here. However, the authors make no remarks regarding the behavior of the Gallager's decoding algorithm when applied to codes of small block length - say of the order of hundreds and thousands of bits - which, in fact, need to be used more frequently in practice.

Intuitively, it is clear that the smaller the block length of the code, the larger the incidence of cycles at small depth of the iterative rounds of Gallager's hard-decoding algorithm. Hence, it would seem that Gallager's algorithm fails for small block-length codes. Nevertheless, this simple observation suggested the idea that by imposing faster convergence constraints in the linear program that is, by forcing the probability of error of the decoding algorithm to decrease faster than required for large codes in the linear program, we might obtain degree sequences that could produce small graphs free of cycles up to an appropriate depth. In particular, as long as we could generate small block-length codes whose girth is larger than the number of rounds required by the decoding algorithm, we would still be guaranteed success with high probability.

In their linear programs, Luby, Mitzenmacher, Shokrollahi and Spielman [4] impose only the weak condition that at particularly chosen values of x, f^i(x) < f^(i+1) (x) for values of i between 100 and 10,000 (this is a guess for number of iterations of the decoding algorithm). We experimented with several rates of reducing the value of f(x), among which the most promising were the exponential rate and various rates linear in the size of the graph. We also imposed the additional constraint that the upper bound on the degree sequences be a fairly small value (25, for example). Since the codes we deal with are small, large degree sequences would inevitably produce undesirable cycles at small depths. Therefore, this condition helps to preclude the formation of small cycles. Unfortunately, a smaller upper bound on the left degree sequence yielded no solution.

Our experiments based on the linear program in the set-up described above suggested the following degree sequences as most promising:


Further, we evaluate the performance of the resulting degree sequences. Recall that our technique does not actually allow us to determine the best values for both left and right sequences, rho and lambda, respectively. Rather, given a vector rho and the numbers dr and dl, we need to determine an optimal vector lambda. Most naturally, we choose the right degree sequence to be regular, namely degrees 10 and 14 in the example codes above. All the codes designed in our experiments have a block length of 1024 bits and rate 1/2, i.e. they contain 512 redundant check nodes.

Based on the information in the table above we constructed appropriate codes as follows. We computed the number of edges in the bipartite graph representing the code using the number of check nodes and their degree, since the right degrees were assumed regular. Then, it is easy to see that the number of message nodes of degree l can be computed as (# of edges)*lambda_l / l, where lambda_l is the fraction of edges connecting to nodes of degree l on the left.

In particular, code 10 has on the left 211 nodes of degree 3, 710 nodes of degree 4, and 103 nodes of degree 16. On the right, it has 511 nodes of degree 10 and one node of degree 11 for balancing. Similarly, code 14 has on the left 711 nodes of degree 5, 208 nodes of degree 6, 26 nodes of degree 21, and 79 nodes of degree 23. On the right, it has 511 nodes of degree 14 and one node of degree 12 for balancing.

To evaluate the optimality of the codes in the above table we compare their performance with that of a regular (4,8) ensemble code, that is a code with degree 4 on the left and degree 8 on the right, again of block length 1024.

The table below presents the percentage of successes (completely correct decoding) on 100 random codewords at the increasing levels of error. The error is quantified as the number of random message nodes corrupted during transmission (out of the 1024 message nodes).



All the codes evaluated have block length 1024 bits and rate 1/2. Code 10 II and Code 14 II stand for the same codes using the second decoding algorithm (after we have used the first one to a point).

The results obtained are encouraging in the case of the regular code, but less so for the irregular codes that we tested. At low probabilities of error, i.e. 0.004, all ensembles of codes perform fairly well, recovering the message completely in 80% of the cases for the irregular codes and 98% of the case for the regular codes. The performance deteriorates faster than we expected as the initial error probability increases; the performance drops below 4% at error level 0.02 in the case of irregular codes, and 0.03 in the case of regular code. Comparing our results with those for large block length ensembles described by Mitzenmacher, Shokrollahi and Spielman [4], we observe a decrease in performance in the sense that our best codes (regular) are able to recover completely with probability larger than › from an initial probability of error of at most 0.015, while their best codes (irregular) are capable of recovering with high probability from an initial error rate of 0.0627. Of course, the better performance of their codes is justified from a theoretical perspective. The large block length (above 16,000 message bits) of the codes evaluated allows the authors to exploit properties like better expansion and absence of small cycles, of which even random codes of such size benefit. Finally, note that in the results presented above we counted only the cases of fully correct decoding as successes. The experiments in which we counted the fraction of recovered errors indicate slightly better performance at larger initial probabilities of error.

Since the lowest left degree of our irregular codes is small, i.e. 3, and hence the corresponding randomly generated codes may not have good expansion properties, we attempted to use the second decoding algorithm (sequential decoding) for the last few rounds of the decoding process. Surprisingly, our results suggest that this decoding algorithm, which is expected to improve the action of codes that are not good expanders, worsened it. The reason behind this unexpected discovery remains to be explored in our future research projects.

5. Conclusion

One reason for the lower performance of the irregular codes may be the presence of fairly high degree nodes on the left, which greatly increases the probability of small cycles in such a small code. Also the rounding errors introduced by the need to have integer solutions, whereas the linear program used gives rational solutions, may lead to a degree sequence that is further from optimal than expected.

As a suggested improvement to our method, one could use the algorithm of Mao and Banihashemi to detect and remove any harmful small cycles. Theoretically, the above results should improve significantly since the probability of small cycles should decrease by the use an additional pruning method.

Our search method and experiments have not yet allowed us to confirm our intuition that irregular codes should perform better than regular codes even in the case of small messages. More specifically, we have not yet discovered any small irregular code designs that perform better than regular ones. So far, our experiments seem to suggest that for small messages, regular codes can recover a larger fraction of errors. In this context, it may be of particular interest to find the threshold length of a message for which irregular codes start to perform better than the regular ones.

References

1. van Lint, J.H. Introduction to Coding Theory, 2nd Edition. Springer-Verlag, 1992.

2. Gallager, R.G. Low Density Parity-Check Codes. Cambridge, MA: MIT Press, 1963.

3. Luby M. G., Mitzenmacher M., Shokrollahi M. A. and Spielman D. Efficient Erasure Correcting Codes, IEEE Trans. on Information Theory, Vol. IT-47, pp. 569-584, Feb. 2001.

4. Luby M. G., Mitzenmacher M., Shokrollahi M. A. and Spielman D. Improved Low Density Parity Check Codes Using Irregular Graphs and Belief Propagation. IEEE Trans. on Information Theory, Vol. IT-47, pp. 585-598, Feb. 2001.

5. MacKay, D.J.C and Neal, RM. Good codes based on very sparse matrices. Cryptography and Coding: Proceedings of the 5th IMA conference. Springer-Verlag, 1995, p. 100-111.

6. Mao, Y. and Banihashemi, A.H. A Heuristic Search for Good Low-Density Parity-Check Codes at Short Block Lengths, presented at IEEE ICC2001, Helsinki, Finland, June 11-14, 2001.

7. Mao, Y. and Banihashemi, A.H. Design of Good LDPC Codes Using Girth Distribution, presented at IEEE International Symposium on Information Theory, Italy, June, 2000.

8. Richardson, T.J. and Urbanke, R.L. The Capacity of Low-Density Parity-Check Codes Under Message-Passing Decoding, IEEE Trans. on Information Theory, Vol. IT-47, pp. 599-618, Feb. 2001.

9. Richardson T.J., Shokrollahi M.A. and Urbanke R.L. Design of Capacity-Approaching Irregular Low-Density Parity-Check Codes, IEEE Trans. on Information Theory, Vol. IT-47, pp. 619-637, Feb. 2001

10. Richardson, T.J. and Urbanke, R.L. Efficient Encoding of Low-Density Parity-Check Codes, IEEE Trans. on Information Theory, Vol. IT-47, pp. 638-656, Feb. 2001