HP Text Chapter 1
(John L. Hennessy and David A. Patterson, Computer Architecture: A
Quantitative Approach, Third Edition, Morgan Kaufmann, 2003.)
G. M. Amdahl, ``Validity of the single-processor approach to achieving large scale computing capabilities,'' In AFIPS Conference Proceedings,
pp. 483--485, April 1967.
G. Radin, ``The 801 Minicomputer,'' In Proceedings of the
International Symposium on Architectural Support for Programming
Languages and Operating Systems, pp. 39--47, March 1982. [abstract and paper
link]
D. A. Patterson and D. R. Ditzel, ``The Case for the Reduced
Instruction Set Computer,'' in ACM SIGARCH Computer Architecture
News, vol. 8, pp. 25--33, October, 1980.
[abstract and paper
link]
J. S. Emer and D. W. Clark, ``A Characterization of Processor
Performance in the VAX--11/780,'' In Proceedings of the 11th
International Symposium on Computer Architecture, pp. 301--310, June 1984.
[abstract and paper link]
J. S. Emer and D. W. Clark, ``Retrospective: A Characterization of
Processor Performance in the VAX--11/780,'' In 25 Years of the
International Symposia on Computer Architecture: Selected Papers,
pp. 37--38, 1998.
[abstract and paper link]
Bhandarkar, D. and Clark, D.W.,
``Performance from Architecture: Comparing a RISC and a CISC with Similar Hardware Organization,"
In Proceedings of the Fourth International Conference on Architectural Support for
Programming Languages and Operating Systems, pp. 310-319,
April, 1991. [abstract
and paper link]
L. M. Ni and P. K. McKinley, ``A survey of Wormhole Routing Techniques
in Direct Networks,'' IEEE Computer, 26(2):62--76, 1993.
[abstract
and PDF link]
Supplemental:
B. M. Maggs, ``Randomly wired Multistage Networks,'' in Statistical
Science, 8(1):70--74, February, 1993.
[Abstract and
Links] -- good survey/starting point.
S. Arora, B. M. Maggs, and F. T. Leighton, ``On-line Algorithms for
Path Selection in a Nonblocking Network,''
in SIAM Journal on Computing, 25(3):600--625, June 1996.
[Abstract and
links] -- when you want all the details and proofs.
Frederic Chong, Eran Egozy, and Andre DeHon.
Fault Tolerance and Performance of Multipath Multistage
Interconnection Networks.
In Thomas F. Knight Jr. and John Savage, editors, Advanced
Research in VLSI and Parallel Systems 1992, pages 227-242. MIT Press, March
1992.
[PDF][PS]
William J. Dally and Charles L. Seitz, ``The Torus Routing Chip,''
Distributed Comptuing 1:187--196, 1986. [PDF
of article as Caltech TR]
R. M. Tomasulo, ``An Efficient Algorithm for Exploiting Multiple
Arithmetic Units,'' In IBM Journal of Research and Development,
Volume 11, pp. 25--33, January, 1967. [PDF]
Vikas Agarwal, M. S. Hrishikesh, Stephen W. Keckler, and Doug
Burger. ``Clock rate versus IPC: the end of the road for conventional
microarchitectures.''
In International Symposium on Computer
Architecture, 2000, pp. 248--259.
[abstract and paper
links]
Subbarao Palacharla, Norman P. Jouppi, and James E. Smith.
``Complexity-Effective Superscalar Processors.''
In International Symposium on Computer
Architecture, 1997.
[PDF Link]
Joseph A. Fisher, ``Very Long Instruction Word Architectures and the
ELI-512,'' In The Tenth International Symposium on Computer
Architecture, 1983.
[PDF]
Joseph A. Fisher, ``Retrospective: Very Long Instruction Word
Architectures and the ELI-512,'' In 25 Years of the International
Symposia on Computer Architecture: Selected Papers, pp. 34--36, 1998.
[abstract and paper
link]
Joseph A. Fisher and Stefan M. Freudenberger, ``Predicting Conditional
Branch Directions from Previous Runs of a Program,'' In Proceedings of
the Firfth International Conference on Architectural Support for
Programming Languges and Operating Systems, pp. 85--95, 1992.
[abstract and paper
link]
Erik R. Altman, David Kaeli, and Yaron Sheffer, ``Welcome to the Opportunities of Binary Translation,'' IEEE Computer, Volume 33, Number 3,
pp. 40--45, March, 2000. [abstract and paper link]
Cindy Zheng and Carol Thompson, ``PA-RISC to IA-64: Transparent Execution, No Recompilation'' IEEE Computer, Volume 33, Number 3,
pp. 47--52, March, 2000. [abstract and paper link]
Michael Gschwind, Erik R. Altman, Sumedh Sathaye, Paul Ledak, and David
Appenzeller, ``Dynamic and Transparent Binary Translation,'' IEEE
Computer, Volume 33, Number 3, pp. 54--59, March, 2000. [abstract
and paper link]
Supplemental:
Richard L. Sites, Anton Chernoff, Matthew B. Kirk, Maurice P. Marks,
aand Scott G. Robinson, ``Binary Translation,'' In Digital Technical
Journal, Volume 4, Number 4, 1992. [PDF]
Alexander Klaiber,
``The Technology Behind Crusoe(TM) Processors'', January, 2000.
[PDF]
Peter Markstein, IA-64 and Elementrary Functions, Hewlett-Packard, 2000. (read p. 9--40 for class)
Sally McKee. ``Reflections on the Memory Wall,'' in
Proceedings of the Conference on Computing Frontiers,
pp. 162, 2004. [abstract and paper
link]
Johannes M. Mulder and Nhon T. Quach, and Michael J. Flynn. ``An Area
Model for On-Chip Memories and its Application,'' IEEE Journal of
Solid State Circuits, Volume 26, Number 2, pp. 98--106, February,
1991.
[abstract
and paper link]
W. Daniel Hillis and Guy L. Steele ``Data Parallel Algorithms'',
CACM 29(12)1170--1183, 1986.
[abstract and paper link]
Supplemental:
Appendix G of tex
(John L. Hennessy and David A. Patterson, Computer Architecture: A
Quantitative Approach, Third Edition, Morgan Kaufmann, 2003.)
[available online]
M. Bolotski, T. Simon, C. Vieri, R. Amirtharajah, TF Knight, Jr. ``Abacus: a 1024 processor 8 ns SIMD array.'' Proceedings. Sixteenth
Conference on Advanced Research in VLSI. IEEE Comput. Soc. Press. 1995, pp.28-40. Los Alamitos, CA, USA.
J. Wawrzynek, K. Asanovic, B. Kingsbury, J. Beck, D. Johnson, and N. Morgan. SPERT-II: A Vector Microprocessor System. IEEE Computer, March
1996. [abstract
and paper link]
Michael Noakes, Deborah Wallach, and William J. Dally.
``The J-Machine Multicomputer: An Architectural Evaluation,''
In Proceedings of the 20th International Symposium on
Computer Architecture, pp. 224--235, May 1993.
[abstract and paper
links]
William J. Dally, Andrew Chang, Andrew Chien, Stuart Fiske,
Waldemar Horwat, John Keen, Richard Lethin, Michael Noakes,
Peter Nuth, Ellen Spertus, Deborah Wallach, and D. Scott Wills,
``Retrospective: The J-Machine,''
In 25 Years of the International Symposia on Computer Architecture: Selected Papers, pp. 54--78, 1998.
[abstract and paper
links]
Thorsten von Eicken, David E. Culler, Klaus Erik Schauser and Seth
Copen Goldstein, ``Retrospective: Active Messages: A Mechanism for
Integrating Computation and Communication,'' In 25 Years of the
International Symposia on Computer Architecture: Selected Papers,
pp. 83--84, 1998.
[abstract and
paper links]
Dana S. Henry and Christopher F. Joerg. ``A Tightly-Coupled
Processor-Network Interface,'' In Proceedings of the Fifth
International Conference on Architectural Support for Programming Languages
and Operating SystemsBoston, MA, October 1992.
[abstract
and paper links]
Supplemental:
J. Salzer, D. Reed, and D. Clark. ``End-To-End Arguments in
System Design,'' In ACM Transactions on Computer
Systems, volume 2, Number 4, pp. 277-288, Nov. 1984.
[PDF]
David Patterson, Garth Gibson, and Randy Katz, ``A Case for
Redundant Arrays of Inexpensive Disks (RAID),'' In Proceedings of
the ACM SIGMOD Conference, p. 109--116, June 1988.
[abstract and paper
links]
Supplemental:
Brent Keeth and Jacob Baker, DRAM Circuit Design: A Tutorial,
IEEE Press, 2001, pp. 108--116.
Ashok K. Sharma, Semiconductor Memories, IEEE Press, 1997,
pp. 230--248.
André DeHon, Frederic Chong, Matthew Becker, Eran Egozy,
Henry Minsky, Samuel Peretz, and Thomas F. Knight, Jr.,
``Transit Note #97: METRO: A Building Block for Fault-Tolerant,
Multiprocessor Routing Networks,'' [compressed PS]
D. M. Tullsen, S. J. Eggers, J. S. Emer, H. M. Levy, J. L. Lo, and
R. L. Stamm, ``Exploiting choice: Instruction fetch and issue on an
implementable simultatneous multithreading processor,'' Proceedings of
the 23rd Annual International Symposium on Computer Architecture,
pp. 191--202, May 1996.
[abstract and paper
links]
James Burns and Jean-Luc Gaudiot, ``Quantifying the SMT Layout Overhead---Does SMT Pull Its Weight?,'' Proceedings of the International Symposium on High-Performance Computer Architecture, pp. 109--120, 2000.
[abstract
and paper links]
Mark S. Papamarcos and Janak H. Patel, ``A Low-Overhead Coherence
Solution for Multiprocessors with Private Cache Memories,'' In
Proceedings of the 11th International Symposium on Computer
Architecture, 1984.
[abstract and
paper links]
Janak H. Patel, ``Retrospective: A Low-Overhead Coherence Solution for
Multiprocessors with Private Cache Memories,'' In 25 Years of the
International Symposia on Computer Architecture: Selected Papers,
pp. 39--41, 1998.
[abstract and
paper links]
Anant Agarwal, ``Retrospective: The MIT Alewife Machine: Architecture and Performance,''
In 25 Years of the International Symposia on Computer Architecture:
Selected Papers, pp. 103--110, 1998.
[abstract and paper
links]
John Hennessey, ``Retrospective: Evaluation of Directory Schemes for Cache Coherence,''
In 25 Years of the International Symposia on Computer Architecture:
Selected Papers, pp. 61--62, 1998.
[abstract and paper
links]
G. M. Papadopoulos and D. E. Culler, ``Monsoon: An Explicit Token Store Architecture,'' In Proceedings of the 17th International Symposium on
Computer Architecture, Seattle, Washington, May 1990.
[abstract and paper links]
G. M. Papadopoulos and D. E. Culler, ``Retrospective: Monsoon: An Explicit Token Store Architecture,''
In 25 Years of the International Symposia on Computer Architecture:
Selected Papers, pp. 74--76, 1998.
[abstract and paper links]
Arvind and David E. Culler, Dataflow Architectures MIT/LCS/TM-294, February 1986.
R. S. Nikhil, G. M. Papadopoulos and Arvind,``*T: A Multithreaded
Massively Parallel Architecture,'' In Proceedings of The 19th Annual
International Symposium on Computer Architecture, Gold Coast,
Australia, May 1992, (15 pages).
[abstract and paper links]
John R. Hauser and John Wawrzynek. ``Garp: A MIPS Processor with a
Reconfigurable Coprocessor,'' in Proceedings of the IEEE Symposium on
Field-Programmable Custom Computing Machines (FCCM '97, April 16-18,
1997), pp. 24-33
[
Abstract and pointers] (N.B. earlier version with more
architecture details...can probably scan through parts of it after reading previous)
Supplemental:
Vincent Michael Bove, Jr. and John A. Watlington.
Cheops: A Reconfigurable Data-Flow System for Video Processing.
IEEE Transactions on Circuits and Systems for Video Technology,
5(2):140--149, April 1995. [HTML]