Turbo/MAP Codec Status Report 16 Jan 1997

In my last report I had just sent the five remaining pcb's to be wave soldered. I got the completed cards back on 18 December. First of all I had to modify all the boards (as described in the last report) and then insert all the Xilinx chips and jumpers. Each board was then tested with and without noise and all of them worked perfectly.

While waiting for the boards to come back I speed-wired the rest of the interleaver address delay board and wire-wrapped the backplane (a slow and tedious process). This would allow up to five iterations to be tested. Testing the boards, I found that the decoder worked with no noise up to 4.5 iterations. The problem most likely lay in either the backplane wiring or the delay board itself. I decided to do some noise tests first before taking on this problem.

The tests were made using a rate 1/3 turbo code using identical rate 1/2 16 state codes with polynomials g0 = 17 and g1 = 11 (in octal). An S = 31 [1], 65,536 bit interleaver was used (S = 31 implies that any two consecutive bits are separated by at least 31 other bits after interleaving). The actual rate is reduced by 4092/4096 = 1023/1024 due to the MAP block size being only 4096 bits, with 4 bits used as the tail. The "inner" code is terminated to state 0 while the "outer" code is not terminated. Shannon capacity at this rate is at an Eb/N0 = -0.55 dB (QPSK capacity is at -0.49 dB).

The decoder is capable of operating up to 356 kbit/s but was tested at only 60 kbit/s due to limitations of the PC based noise generator. An 8 MB (each 8 bit byte containing a noise sample) noise source is first generated after inputing sigma and the signal amplitude. Another uniform random number generator with period 231-1 [2] is then used to randomly select one of the noise samples. At a noise symbol rate of 180 ksym/s, this implies that the noise source will repeat itself after about 3 hours and 19 minutes. At a BER of 10-6 this should result in about 716 errors being counted. Personally, I would like to have a better generator, the L'Ecuyer [3] dual 32 bit seed generator in particluar. This generator has a period of about 2.30584x1018. At 155.52 Mbit/s it would take over 156 years for the sequence to repeat!

Below are the results of my first tests at 0.5 and 1.0 dB. Note that half an iteration corresponds to the output of the first MAP decoder, while a full iteration correponds to the output of the second MAP decoder.

Eb/N0 (dB) Number of Iterations
0.5 1 1.5 2 2.5 3 3.5 4 4.5
0.5 1.58x10-1 1.10x10-1 7.78x10-2 5.14x10-2 2.96x10-2 1.32x10-2 4.61x10-3 2.10x10-3 7.52x10-4
1.0 1.29x10-1 6.49x10-2 2.47x10-2 6.11x10-3 1.00x10-3 7.18x10-4 2.52x10-4 9.96x10-4 3.21x10-4

We can see that we get a gradual decrease in BER as we increase the number of iterations at 0.5 dB. At 1.0 dB the BER falls and then rises at four iterations. Simulations performed at JPL [4] for a rate 1/3 turbo code but with N = 16384 (interleaver size) and I = 11 (number of iterations) obtained an Eb/N0 of 0.25 dB for a BER of 10-5. Obviously, my decoder wasn't working too great. I suspected the problem to be in the subtraction circuit after the MAP decoders.

The output of the first MAP decoder has subtracted from it the extrinsic information from the second MAP decoder of the previous iteration. This extrinsic information ranges from -128 to +127. When this extrinsic information is added to the received noisy sample (which ranges from -31 to +31) we could have errors introduced due to non-linearities in the design. For example, say we receive +31 and add the extrinsic information +127 to it. The output of the MAP decoder will be limited to +127. When we subtract the extrinsic information the information fed into the second MAP decoder will then be 0!

To avoid this problem I limited the extrinsic information from the second MAP decoder to be between -64 to +63. This resulted in the following performance:

Eb/N0 (dB) Number of Iterations
0.5 1 1.5 2 2.5 3 3.5 4 4.5
1.0 1.29x10-1 6.49x10-2 2.54x10-2 6.08x10-3 1.00x10-3 6.44x10-4 1.4x10-3 7x10-3 -

The first two results are the same since the new extrinsic information is not being used yet. The BER at 1.5 and 2 iterations is slightly worse with the BER being the same at 2.5 iterations. After 3 iterations or more we have a slight improvement in the BER. This led me to try limiting the output of the subtraction circuit from the first MAP decoder to -64 to +63 as well. This time the improvement was much more dramatic, giving a BER of 9.81x10-4, 1.72x10-4, and 8.3x10-5 at 2.5, 3, and 4.5 iterations. I then tried changing the code to the g0 = 31, g1 = 33 code given in Table I of [5].

There was again a dramatic improvement, with the BER dropping down to 1.13x10-7 at 1.0 dB after 4.5 iterations (an overnight test gave a BER of 8.89x10-8). I wanted to try 5 iterations so I examined the delay board with a logic analyser and found that one of the address wires had been accidently cut. Fixing this gave a BER of 2.23x10-7 for 5 iterations (still to be confirmed with an overnight test). I tried improving the performance by decreasing the limiting to -32 to +31 but that gave a worse result.

One thing I noticed is that at low BERs the first MAP decoder would produce its errors in pairs. The number of bit errors to the number of interval error counts was just above 2. For the second MAP decoder, there were a lot more single bit errors. The bit/interval ratio was less than 1.5. I believe this is due to the second MAP decoder being unterminated and the small block size of the MAP decoder (there are 16 blocks in each interleaved data block). This could explain why the BER goes up when going from a half to a full iteration at low BER.

Once these preliminary tests were done, I then went on and built up a second delay card and complete the backplane wiring so that I could test the 6th and 7th iterations. I found that the 7th iteration was not working. The logic analyser was showing what looked like was misclocking in the second delay board. Examining the clock line I found a lot of undershoot. The second delay card was built slightly differently to the first card which had resulted in a shorter clock line. The clock line was still fairly long and this resulted in those flip-flops closest to the clock source being misclocked. I split the clock net into two and this fixed the problem.

I then started an extensive set of tests on the codec, varying the Eb/N0 from 0.0 to 1.0 dB and the number of iterations from 0.5 to 7. One problem that remained is that after I had rearranged the pinouts for the MAP1 Xilinx chip (the forward and backward state metric calculator and path metric calculator) I couldn't compile the design for 0.0 dB. The smaller Eb/N0 had resulted in larger look-up tables for the E operand. This left less room for the placement and routing of the design and the design simply wouldn't route. I wasted a lot of time using Xilinx's floorplanner program by placing the design manually. This still did not solve the problem. Finally, I thought of a way of replacing the minimum state metric calculator (MSMC) XBLOX design with a hand made design. This would save two configurable logic blocks (CLB) for each MSMC, resulting in a total saving of 4 CLBs. This small decrease was enough for the design to route at 0.0 dB! I also thought of other ways of reducing the CLB count but these attempts resulted in unroutable designs since they used too much of the routing resources.

The decoder itself gets fairly warm during operation. I ran a half hour test with no noise to make sure there was no residual error rate that could cause a problem. For some of the later tests, I put a fan on top the codec to keep it a little cooler. The previous tests were performed with a lot of the boards not having any front panels (which made pulling them in and out of the rack a little difficult). Also, everytime I changed the number of iterations I had to switch off the rack, put boards in or out, change some jumpers, power on, and download the Xilinx bit files.

To ease this task I put a front panel on each turbo/MAP printed circuit board with two switches connected to the jumper leads. One switch selects the turbo or MAP output and the other switch selects either the first or second MAP decoder. In turbo mode the decoder hard decision output is tri-stated since only the last decoding stage needs to provide an output. By connecting all the hard-decisions outputs together, I could eliminate the multiplexer circuit on the encoder/interface card. However, one was to be careful to ensure that only one of the decoders is in MAP mode, otherwise there will be contention on the data out bus. When selecting turbo mode, the first MAP decoder has to be selected as well. In MAP mode the hard decision output is enabled and we can select either a half or a full iteration. An image and description of the codec is given in our turbo/MAP decoder web site.

Here are the results of my tests so far. In turbo34t.ps we plot BER against the number of iterations for Eb/N0 = 0.0, 0.1, 0.2, 0.3, 0.4, 0.5, and 1.0 dB. Of interest is how the BER suddenly flattens after quickly decreasing for 0.4, 0.5, and 1.0 dB. Another point of interest is the potential or more iterations. The 0.2 dB curve indicates that further improvement is possible and the 0.1 dB curve might be able to reach low BERs as well.

In turbo34f.ps we plot BER versus Eb/N0 for 6.5 and 7 iterations. We see that 10-5 and 10-6 is achieved at 0.32 dB and 0.38 dB, respectively. This is 0.87 dB and 0.93 dB away from rate 1/3 capacity for BERs of 10-5 and 10-6, respectively. Unfortunately, 10-7 is close to an Eb/N0 of 1.0 dB due to the BER flattening above 0.4 dB. We also see that above 0.4 dB the BER from 6.5 iterations performs better than 7 iterations. This may be due to the second MAP decoder being unterminated or perhaps an effect of other non-linearities in the decoder.

Better performance may be achieved with a better designed interleaver and optimisation of the limiting in the subtractors after the MAP decoders. Unfortunately, I don't have time to try these things as I want to get the paper on this completed as well as get some results for a rate 1/7 turbo decoder.

I would like everyone to know that I have formed my own company called Small World Communications. I plan to design error control decoders for programmable gate arrays.

References

  1. D. Divsalar and F. Pollara, "Multiple turbo codes for deep-space communications," JPL TDA Progress Report, vol. 42-121, pp. 66-77, Jan.-Mar. 1995. 121T.pdf (420k)

  2. S. K. Park and K. W. Miller, "Random number generators: Good ones are hard to find," Commun. of the ACM, vol. 31, pp. 1192-1201, Oct. 1988.

  3. P. L'Ecuyer, "Efficient and portable combined random number generators," Commun. of the ACM, vol. 31, pp. 742-749,774, June 1988.

  4. D. Divsalar and F. Pollara, "On the design of turbo codes," JPL TDA Progress Report, vol. 42-123, pp. 99-121, July-Sep. 1995. 123D.pdf (574k)

  5. S. Benedetto and G. Montorsi, "Design of parallel concatenated convolutional codes," IEEE Trans. Commun., vol. 44, pp. 591-600, May 1996. tr_com95.ps.gz (179k)

Steven S. Pietrobon, Satellite Communications Research Centre
University of South Australia, The Levels SA 5095, Australia.
Steven.Pietrobon@unisa.edu.au http://www.itr.unisa.edu.au/~steven/