Thread View: comp.arch
16 messages
16 total messages
Started by kruger@16bits.de
Tue, 01 Mar 1988 22:07
Harvard Architecure
Author: kruger@16bits.de
Date: Tue, 01 Mar 1988 22:07
Date: Tue, 01 Mar 1988 22:07
11 lines
415 bytes
415 bytes
a) And architecture working at a prestious law firm, making over $40K to start. b) An architecture that has separate address and data busses so that instruction fetches don't stop the processor from doing data transfers, which is what it's supposed to be doing! Motorola claims that the 68030 has an "internal Harvard architecture" by which they mean it has separate internal instruction and data caches. dov
Re: Harvard Architecure
Author: brucek@hpsrla.HP
Date: Mon, 07 Mar 1988 21:55
Date: Mon, 07 Mar 1988 21:55
16 lines
473 bytes
473 bytes
+------- | Motorola claims that the 68030 has an "internal Harvard architecture" by | which they mean it has separate internal instruction and data caches. +------- Ahh, those massive 256-Byte caches are really going to speed this puppy up :-) Bruce Kleinman brucek%hpnmd@hpcea.hp.com -or- ...hplabs!hpnmd!brucek Hewlett Packard - Network Measurements Division Santa Rosa, California
Re: Harvard Architecure
Author: lindsay@K.GP.CS.
Date: Wed, 09 Mar 1988 05:22
Date: Wed, 09 Mar 1988 05:22
17 lines
659 bytes
659 bytes
In article <3460011@hpsrla.HP.COM> brucek@hpsrla.HP.COM (Bruce Kleinman) writes about the 68030: >Ahh, those massive 256-Byte caches are really going to speed this puppy up :-) Actually, it will. Remember, the CDC 6600 got a win from an "instruction stack" of 480 bits ! Plus, the two caches access in parallel (versus the one cache of the 68020). Plus, the caches now take one clock (versus 2 on the 68020). Plus, the caches now have burst refill (if the board designer supports it, of course.) All in all, a clear improvement. I don't hear any suggestions as to a better use for the silicon. -- Don lindsay@k.gp.cs.cmu.edu CMU Computer Science
Re: Harvard Architecure
Author: bcase@Apple.COM
Date: Wed, 09 Mar 1988 20:07
Date: Wed, 09 Mar 1988 20:07
20 lines
795 bytes
795 bytes
In article <1071@PT.CS.CMU.EDU> lindsay@K.GP.CS.CMU.EDU (Donald Lindsay) writes: >In article <3460011@hpsrla.HP.COM> brucek@hpsrla.HP.COM (Bruce Kleinman) > writes about the 68030: >>Ahh, those massive 256-Byte caches are really going to speed this puppy up :-) Talking about 68030. > >Actually, it will. Remember, the CDC 6600 got a win from an "instruction >stack" of 480 bits ! > >All in all, a clear improvement. I don't hear any suggestions as to a better >use for the silicon. A little birdie with an EE degree told me that you can expect maybe a 20% improvement over a 68020 at the same clock rate. An improvement, yes, but a better use of silicon might have been some on-chip floating point. Or how about more pins so as to expose the harvard architecture to the external world?
Re: Harvard Architecure
Author: bcase@Apple.COM
Date: Wed, 09 Mar 1988 20:14
Date: Wed, 09 Mar 1988 20:14
8 lines
375 bytes
375 bytes
In article <7614@apple.Apple.Com> bcase@apple.UUCP (Brian Case) writes: >A little birdie with an EE degree told me that you can expect maybe a 20% >improvement over a 68020 at the same clock rate. Oops, I should have said that the little birdie also showed me the system running real stuff. That is one amazing little bird (has trouble holding the soldering iron though).
Re: Harvard Architecure
Author: brucek@hpsrla.HP
Date: Thu, 10 Mar 1988 20:09
Date: Thu, 10 Mar 1988 20:09
39 lines
1726 bytes
1726 bytes
+------- | >Ahh, those massive 256-Byte caches are really going to speed this | > puppy up :-) | | Actually, it will. Remember, the CDC 6600 got a win from an "instruction | stack" of 480 bits ! | | Plus, the two caches access in parallel (versus the one cache of the 68020). | Plus, the caches now take one clock (versus 2 on the 68020). | Plus, the caches now have burst refill (if the board designer supports it, | of course.) | | All in all, a clear improvement. I don't hear any suggestions as to a better | use for the silicon. +------- All in all a clear improvement? Over the '020, perhaps ... The 256-byte data cache is of questionable value, as the miss-rate will be fairly high. I would supply numbers for "fairly high," but I can't seem to find any miss-rate data for caches of smaller than 1 Kbyte. Furthermore, the restricted implementation of the data cache makes it useless in multi-processor systems as well as systems with DMA. The data cache has no facilities for coherency. Solution - disable the data cache or flush, flush away. Suggestions as to a better use for the silicon? OK ... Expand the I-cache. I suspect that loosing the D-cache could make room for a 1 Kbyte I-cache. This would offer honest improvements in almost all systems. High-perf machines can surround the '030 with an hefty sys-cache for increased performance. Memory bandwidth problems, you say? Bring the Harvard architecture out to the pins - which is exactly what Motorola did for the 88000. Bruce Kleinman brucek%hpnmd@hpcea.hp.com -or- ...hplabs!hpnmd!brucek Hewlett Packard - Network Measurements Division Santa Rosa, California
CPU chip cache sizes, was Re: Harvard Architecure
Author: grenley@nsc.nsc.
Date: Fri, 11 Mar 1988 20:29
Date: Fri, 11 Mar 1988 20:29
29 lines
929 bytes
929 bytes
In article <1071@PT.CS.CMU.EDU> lindsay@K.GP.CS.CMU.EDU (Donald Lindsay) writes: >In article <3460011@hpsrla.HP.COM> brucek@hpsrla.HP.COM (Bruce Kleinman) > writes about the 68030: >>Ahh, those massive 256-Byte caches are really going to speed this puppy up :-) I'm sure they will. Heaven knows it needs it... >Plus, the two caches access in parallel (versus the one cache of the 68020). As do the two caches on the 32532 >Plus, the caches now take one clock (versus 2 on the 68020). Likewise on the 532 >Plus, the caches now have burst refill (if the board designer supports it, >of course.) So does the 532. We also have larger caches (512 instruction, 1K 2 way set associative data). I have seen the studies on hit rate vs size for the 532; since the 030 is roughly similar architecture I expect they have the same tradeoffs. 256 bytes is better than no bytes, but it is still pretty small. George Grenley NSC
Re: CPU chip cache sizes, was Re: Harvard Architecure
Author: mash@mips.COM (J
Date: Sat, 12 Mar 1988 03:45
Date: Sat, 12 Mar 1988 03:45
21 lines
1089 bytes
1089 bytes
In article <5009@nsc.nsc.com> grenley@nsc.UUCP (George Grenley) writes: >So does the 532. We also have larger caches (512 instruction, 1K 2 way >set associative data). I have seen the studies on hit rate vs size for >the 532; since the 030 is roughly similar architecture I expect they have >the same tradeoffs. 256 bytes is better than no bytes, but it is still >pretty small. I recall there was speculation when the 68030 was announced that the D-cache might actually cost you performance in general applications, and that people would end up turning it off [unlike the I-cache, where even a small cache is almost always useful]. However, I've seen no data published one way or another on this yet, and I don't have any. Do you (or anybody else) have any good data on a 256-byte cache with 16 16-byte lines? (i.e., the 68030 D-cache) -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
Re: CPU chip cache sizes, was Re: Harvard Architecure
Author: north@Apple.COM
Date: Mon, 14 Mar 1988 17:50
Date: Mon, 14 Mar 1988 17:50
29 lines
1843 bytes
1843 bytes
In article <1853@winchester.mips.COM> mash@winchester.UUCP (John Mashey) writes: >I recall there was speculation when the 68030 was announced that the >D-cache might actually cost you performance in general applications, >Do you (or anybody else) have any good data on a 256-byte cache with >16 16-byte lines? (i.e., the 68030 D-cache) Having had some '030 experience as of late, I have found that (in real working hardware) the D-cache is always a 'win', even though it is small by most standards. I have yet to find a benchmark (Dhrystone, any others of the small integer class) or some real application code in which the performance is less when the cache is enabled than disabled. The typical performance improvement ranges from a low of 5% to a high of 25%, 'average' for larger applications appears to be about 20%. In looking at the cache organization (direct mapped, 16-16 byte lines) one could construct a sequence of references (accessing data locations exactly 256 bytes apart, for example) that causes thrashing in particular cache lines. This is a problem in all direct mapped caches, and one would think it to be especially severe with such a few number of entries (16 in the '030s case). I suspect this is one reason for the relatively low performance improvement figures; the other is that the cache is just too small to be 'really' useful except in limited situations. Two that come immediately to mind are pushing stack arguments that are then accessed relatively soon, and accesses to local stack storage. Don North ----- Apple Computer, Inc. ----- Advanced Technology Group UUCP: {voder,nsc,dual,sun}!apple!north CSNET: north@apple.com {{ Facts are facts, but any opinions expressed are my own, and *do not* }} {{ represent any viewpoint, official or otherwise, of Apple Computer, Inc.}}
Re: CPU chip cache sizes, was Re: Harvard Architecure
Author: lindsay@K.GP.CS.
Date: Wed, 16 Mar 1988 18:31
Date: Wed, 16 Mar 1988 18:31
29 lines
1515 bytes
1515 bytes
In article <7672@apple.Apple.Com> north@apple.UUCP (Donald N. North) writes: >Having had some '030 experience as of late, I have found that (in real working >hardware) the D-cache is always a 'win', even though it is small by most >standards. I have yet to find a benchmark (Dhrystone, any others of the small >integer class) or some real application code in which the performance is less >when the cache is enabled than disabled. The typical performance improvement >ranges from a low of 5% to a high of 25%, 'average' for larger applications >appears to be about 20%. You didn't mention how many wait states on a memory access. Also, you didn't mention if the board supports burst-fill. I assume that the 20% is for a hot board. I would expect that the cache gets more useful as boards get slower. In particular, the 68030 should be much more useful than a 68020 when given a slow 8-bit-wide memory - i.e. a minimum configuration. I was recently surprised to learn that 68020 minimum configurations weren't just showing up in minimum-cost systems. Apparently, some are embedded in other systems, doing the sort of thing that an 8-bitter could hack (like, hardware diagnostics). I assume that the 68030 will show up eventually in this role. Of course, 68020's have also been used as IO controllers and the like. Does anyone have insight into the minimum/maximum aspects of these uses, or the likelihood of SPARC/MIPS/etc pushing into these roles ? -- Don lindsay@k.gp.cs.cmu.edu CMU Computer Science
Re: Harvard Architecure
Author: alan@pdn.UUCP (A
Date: Fri, 18 Mar 1988 22:02
Date: Fri, 18 Mar 1988 22:02
30 lines
1189 bytes
1189 bytes
In article <7614@apple.Apple.Com> bcase@apple.UUCP (Brian Case) writes: /In article <1071@PT.CS.CMU.EDU> lindsay@K.GP.CS.CMU.EDU (Donald Lindsay) writes: />In article <3460011@hpsrla.HP.COM> brucek@hpsrla.HP.COM (Bruce Kleinman) /> writes about the 68030: />>Ahh, those massive 256-Byte caches are really going to speed this puppy up :-) / /Talking about 68030. / /> />Actually, it will. Remember, the CDC 6600 got a win from an "instruction />stack" of 480 bits ! /> />All in all, a clear improvement. I don't hear any suggestions as to a better />use for the silicon. / /A little birdie with an EE degree told me that you can expect maybe a 20% /improvement over a 68020 at the same clock rate. An improvement, yes, but /a better use of silicon might have been some on-chip floating point. Or /how about more pins so as to expose the harvard architecture to the external /world? The 68020 **CONSISTENTLY** benchmarks twice as fast with the instruction cache turned on (compared to its being turned off). Apparently the '030 gets **AT LEAST** a 30% performance boost from turning on the data cache (so I have been told by those who have benchmarked one). Enough said. --alan@pdn
Re: Harvard Architecure
Author: mash@mips.COM (J
Date: Sat, 19 Mar 1988 21:41
Date: Sat, 19 Mar 1988 21:41
34 lines
1697 bytes
1697 bytes
In article <2594@pdn.UUCP> alan@pdn.UUCP (0000-Alan Lovejoy) writes: >The 68020 **CONSISTENTLY** benchmarks twice as fast with the instruction >cache turned on (compared to its being turned off). Apparently the >'030 gets **AT LEAST** a 30% performance boost from turning on the data >cache (so I have been told by those who have benchmarked one). > >Enough said. 1) Even small I-caches are almost always useful: the question that started this all was whether or not small D-caches were useful, and if so, how much, and under what circumstances. 2)One would expect (rightly or wrongly) that overall system design would heavily influence the benefit level of a small on-chip D-cache. I.e., one would expect that, for example, turning the D-cache on would help more in a Sun-3/160 - style design (no external cache) than in a /260 design (well-designed, fast external cache). (Expectations could be wrong, but...) 3) Data would help: whenever 68030 systems become widely available, and especially if there are convenient ways to turn the caches on/off, people could do comp.arch a large service by: a) running large, realistic benchmarks b) reporting the results c) reporting the clock speed and overall memory system configuration. Until then, all we've got to go on is indirect reports, having no idea what sorts of benchmarks and configurations are being tested. Alan: can you possibly offer more fo the details, or is it still proprietary? -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
Re: Harvard Architecure (really 680x0 caches)
Author: radford@calgary.
Date: Sun, 20 Mar 1988 19:24
Date: Sun, 20 Mar 1988 19:24
26 lines
1186 bytes
1186 bytes
In article <2594@pdn.UUCP>, alan@pdn.UUCP (Alan Lovejoy) writes: > The 68020 **CONSISTENTLY** benchmarks twice as fast with the instruction > cache turned on (compared to its being turned off). Apparently the > '030 gets **AT LEAST** a 30% performance boost from turning on the data > cache (so I have been told by those who have benchmarked one). > > Enough said. This is a bit hard to believe, seeing as the 68020 takes two cycles to access a word from cache, and only three to access the same word from external memory (if my memory serves me right). Of course, the external memory might be slow, and require wait states to be added to the three cycles. If you're willing to go for that, however, I'm sure I could build a system in which turning on the cache speeds things up by a factor of ten. (Well, actually, I'm not sure I could, not having much experience with a soldering iron, but you know what I mean... :-) What's needed are data on: (1) the benefit of the '030's data cache assuming zero wait states (and a 32-bit bus), (2) the benefit for various other memory configurations, (3) the overall benefit in systems typical of various applications. Radford Neal
Re: Harvard Architecure
Author: pf@diab.UUCP (Pe
Date: Mon, 21 Mar 1988 07:41
Date: Mon, 21 Mar 1988 07:41
11 lines
461 bytes
461 bytes
In article <2594@pdn.UUCP> alan@pdn.UUCP (0000-Alan Lovejoy) writes: >The 68020 **CONSISTENTLY** benchmarks twice as fast with the instruction >cache turned on (compared to its being turned off). ............... > >--alan@pdn From our experience tests has shown that turning off the internal cache in the "020" results in a 20% slowdown if the external memory don't have wait states. So Your figure must be from a system with slow external memory, right ??
Re: Harvard Architecure
Author: chow@batcomputer
Date: Wed, 23 Mar 1988 19:24
Date: Wed, 23 Mar 1988 19:24
25 lines
1255 bytes
1255 bytes
In article <373@ma.diab.UUCP> pf@ma.UUCP (Per Fogelstr|m) writes: |In article <2594@pdn.UUCP> alan@pdn.UUCP (0000-Alan Lovejoy) writes: ||The 68020 **CONSISTENTLY** benchmarks twice as fast with the instruction ||cache turned on (compared to its being turned off). ............... || ||--alan@pdn | |From our experience tests has shown that turning off the internal cache in the |"020" results in a 20% slowdown if the external memory don't have wait states. | |So Your figure must be from a system with slow external memory, right ?? My own tests have also shown that on a Mac II, the 68020 instruction cache results in a 20% performance change. (I think the Mac II has 1 wait state.) Christopher Chow /---------------------------------------------------------------------------\ | Internet: chow@tcgould.tn.cornell.edu (128.84.248.35 or 128.84.253.35) | | Usenet: ...{uw-beaver|ihnp4|decvax|vax135}!cornell!batcomputer!chow | | Bitnet: chow@crnlthry.bitnet | | Phone: 1-607-253-6699 Address: 7122 N. Campus 7, Ithaca, NY 14853 | | Delphi: chow2 PAN: chow | \---------------------------------------------------------------------------/
Re: Harvard Architecure
Author: alan@pdn.UUCP (A
Date: Fri, 25 Mar 1988 14:33
Date: Fri, 25 Mar 1988 14:33
28 lines
1339 bytes
1339 bytes
In article <373@ma.diab.UUCP> pf@ma.UUCP (Per Fogelstr|m) writes: >In article <2594@pdn.UUCP> alan@pdn.UUCP (0000-Alan Lovejoy) writes: >>The 68020 **CONSISTENTLY** benchmarks twice as fast with the instruction >>cache turned on (compared to its being turned off). ............... >> >>--alan@pdn > >From our experience tests has shown that turning off the internal cache in the >"020" results in a 20% slowdown if the external memory don't have wait states. > >So Your figure must be from a system with slow external memory, right ?? Absolutely correct. 150ns DRAMs to be precise. Using 45ns SRAMs, the figure is closer to the 20% you quoted (my source gets a 30% difference with his benchmarks and his compiler). Somehow only the 150ns DRAM figure stuck in my mind (perhaps because my main interest is personal computers where 45ns SRAMs are too expensive). The numbers from the '030 also are for relatively slow external memory and no external cache. Sorry, but I can't be more specific than that. But an interesting point is raised here: what's good for a $50,000 workstation may not be so good for a $5000 pc, and vice-versa. What's good for running UNIX&C may not be good for running Smalltalk, and vice versa. This is not new information, but the discussion in this group tends to lose sight of it at times. --alan@pdn
Thread Navigation
This is a paginated view of messages in the thread with full content displayed inline.
Messages are displayed in chronological order, with the original post highlighted in green.
Use pagination controls to navigate through all messages in large threads.
Back to All Threads