





|                       | A    | rc | r  | nite                  | cture Instruction           |  |
|-----------------------|------|----|----|-----------------------|-----------------------------|--|
| CS184a                |      |    |    |                       | axonomy                     |  |
|                       |      |    |    |                       |                             |  |
|                       | Сс   | _  |    |                       |                             |  |
|                       |      | p  |    | ts <mark>per C</mark> |                             |  |
|                       |      |    | In |                       | on Depth                    |  |
|                       |      |    |    | Granu                 |                             |  |
|                       |      |    |    |                       | Architecture/Examples       |  |
|                       |      | 0  | 0  | n/a                   | Hardwired Functional Unit   |  |
|                       | 0    |    |    |                       | (e.g. ECC/EDC Unit, FP MPY) |  |
|                       |      |    |    | 1                     | FPGA                        |  |
|                       |      | n  | 1  | w                     | Reconfigurable ALUs         |  |
|                       |      |    |    | $n_v \cdot 1$         | Bitwise SIMD                |  |
|                       |      | 1  | c  | w                     | Traditional Processors      |  |
|                       |      |    |    | $n_v \cdot w$         |                             |  |
|                       | 1    |    | c  | 1                     | DPGA                        |  |
|                       |      | n  | 8  | 16                    | PADDI                       |  |
|                       |      |    | c  | w                     | VLIW                        |  |
|                       | m    | n  |    | 1                     | HSRA/SCORE                  |  |
|                       |      |    | C  | $n_v \cdot w$         | MSIMD                       |  |
|                       |      |    | c  |                       | VEGA                        |  |
|                       | m    | 1  | 8  | 16                    | PADDI-2                     |  |
| CALTECH cs184c Spring | 9200 | ļ  |    | w                     | MIMD (traditional)          |  |















































| CS184a               | Calibrate Model                                                          |                                                                                        |
|----------------------|--------------------------------------------------------------------------|----------------------------------------------------------------------------------------|
| FPGA                 | model $w = 1$ , $d = c = 1$ , $k = 4$<br>Xilinx 4K<br>Altera 8K          | <b>880Κ</b> λ <sup>2</sup><br><b>630Κ</b> λ <sup>2</sup><br><b>930Κ</b> λ <sup>2</sup> |
| SIMD                 | <b>model</b> $w = 1000$ , $c = 0$ , $d = 64$ , $k = 3$<br><b>Abacus</b>  | 170Κλ <sup>2</sup><br>190Κλ <sup>2</sup>                                               |
| Processor            | <b>model</b> $w = 32$ , $d = 32$ , $c = 1024$ , $k = 2$<br><b>MIPS-X</b> | <b>2.6Μ</b> λ <sup>2</sup><br><b>2.1Μ</b> λ <sup>2</sup>                               |
| CALTECH cs184c Sprin | ig2001 DeHon                                                             |                                                                                        |















| A                         | bac                                           | cus: | Су     | cles    | 6                |        |  |
|---------------------------|-----------------------------------------------|------|--------|---------|------------------|--------|--|
| Operation                 | 8-bit                                         |      | 16-bit |         |                  | 32-bit |  |
|                           | Cycles                                        | GOPS | Cycles | GOPS    | ~                | GOPS   |  |
| Add                       | 4                                             | 4.0  | 4      | 2.0     | -                | 0.7    |  |
| Shift                     | 2                                             | 8.0  | 2      | 4.0     |                  | 2.0    |  |
| Accumulate                | 3                                             | 5.2  | 3      | 2.6     | 5 3              | 1.3    |  |
| Move                      | 3                                             | 5.2  | 4      | 2.0     | ) 6              | 0.6    |  |
| Compare                   | 6                                             | 2.6  | 11     | 0.6     | 5 12             | 0.2    |  |
| Multiply $(16 \times 16)$ |                                               |      |        |         | 180              | 0.03   |  |
|                           | Algorithm                                     | 1    | C      | ycles / | Time ( $\mu$ see | :)     |  |
|                           | Edge Detection $\sigma = 1.6$                 |      |        |         |                  | 3      |  |
|                           | Optical Flow, $\Delta = 2, 5 \times 5$ region |      |        |         |                  | 4      |  |
|                           | Surface Reconstruction (1 iteration)          |      |        |         |                  | 370 3  |  |
| <u></u>                   |                                               |      | •      |         |                  |        |  |









## T0 ASM example

lhai.v vv1, t0, t1 # Vector load. hmul.vv vv4, vv2, vv3 # Vector mul. sadd.vv vv7, vv5, vv7 # Vector add. addu t2, -1 # Scalar add. lhai.v vv2, t0, t1 # Vector load. hmul.vv vv5, vv1, vv3 # Vector mul. sadd.vv vv8, vv4, vv8 # Vector add. addu t7, t4 # Scalar add.

CALTECH cs184c Spring2001 -- DeHon





