Dauger comments on Power Fractal PPC/Rosetta vs Universal Performance:
Update: on May 10th Dean wrote he's posted Power Fractal v1.4.1 w/intel Mac performance improvements:
(5/10/2007)
"I've been optimizing Power Fractal on my new 8-Core Intel. Boy this thing really heats up the room. I'm claiming on the web site over 80 GF (GigaFlops), but often I get 85 GF (which is almost 90% efficiency) and occasionally I get 90 GF.
You can tell your readers I just posted v1.4.1 at: http://daugerresearch.com/fractals/
and try out the new version on their Macs.
Thanks and have fun,
-Dean"
Version 1.4.1 notes "Newly reoptimized SSE code and fractal presets for the new 8-Core Mac Pro, where it can achieve over 80 GigaFlops." (BTW - One of the earlier reports with (carbon) PF v1.4 running under Rosetta on an 8-core Mac Pro noted 70GF with max count settings of 64K and higher.)
(Earlier comments on v1.4 Power Fractal performance follows)
I had written Dauger about the tests from readers (below) that showed better performance with the carbon(PPC) version under Rosetta than running the Universal (native) version (v1.4 as of April 2007). Dean Dauger replied:
"Thanks very much for emailing. It so happens I read some of the posts yesterday, so I was wondering if someone would email. I think I have an explanation, but I hope your readers will grant me the patience for me to update Power Fractal.
(An early report from an 8-core Mac Pro owner implied under Rosetta it was using more cores, but someone else disputed that (and he later said that both versions maxed the cores out.))
No, I doubt that. Perhaps Rosetta has more supporting background tasks, but, beyond 8 tasks, both versions would still use just 8 cores.
(Altivec was limited to single precision, where Intel SSE I think can do double precision. Are both versions using the same precision math?)
Yes, both single-precision.
The SSE2 instruction set (which is available on all Intel Macs) includes double-precision, but back with the Intel Core Duo's the double-precision flops per clock throughput wasn't any better than scalar code, so there was no point. But that could be different now....
(the only other reason I could think of (with same core usage, same precision math, etc.) was that the Intel/Univ. code was less efficient? Since most apps under rosetta are almost half as fast as
native, any ideas on why the Rosetta version is faster (typically)?)
Yes, that is a good question, and it's a discrepancy I'll work on soon. I'll start with a little background on how those two were compiled:
The PowerPC Carbon CFM version of Power Fractal was compiled using CodeWarrior Pro and uses AltiVec code heavily optimized for the G5.
The Universal version of Power Fractal was compiled using the Xcode 2.4 IDE, which uses gcc to compile its code. To work on Intel, a rewrite of the inner loops for SSE was required.
I have a first-generation MacBook Pro with an Intel Core Duo 2.16GHz, where I optimized Power Fractal's SSE code. I quickly discovered that basic hand optimizations, ones that helped PowerPC, made no difference on this Intel chip. Some research on the Intel Core Duo implementation revealed that the its SSE was capable of only 2 flops per clock. I'm used to AltiVec's theoretical peak of 8 flops/clock, so I was surprised, but since my modestly optimized SSE code was already achieving 2 flops per clock, clearly any additional effort on my part was futile.
Now the Quad-Core Intel Xeon is available. From what I've read its SSE implementation is much better than in the first Core Duo, perhaps 4 flops per clock. The 2 flop/clock bottleneck of the old chip is no longer, hence optimizations I use for PowerPC could help once again.
So the PowerPC Carbon version of Power Fractal has all those optimizations already there, and it appears Rosetta's AltiVec to SSE translator is extremely efficient, and I've always considered CodeWarrior's compiler to be far superior to gcc, and the Quad-Core Xeon's SSE implementation is much better. So all these advantages have conspired to make that performance disturbingly better.
Meanwhile, the Universal version is using SSE code that maxed out a half-speed SSE implementation and is made using a slower compiler, gcc. Still it's getting almost 2 flops per clock as designed. (2 flops/clock/core * 3 GHz * 8 cores = 48 GFlops)
My confession: I don't have an 8-core yet, but I'll get one soon. (I've been a little busy: my son (our first) was born a few weeks ago.) Once I get my hands on a new 8-core Mac Pro (donations are always accepted... ;D j/k) I'll reoptimize Power Fractal to get as close as I can to its 96 GF theoretical peak.
One more thing: I appreciate so many folks use Power Fractal to test their hardware, so I'm glad to have made a difference there.
Thanks a lot,
Dean Dauger, Ph. D.
President
Dauger Research, Inc.
"
As mentioned in the May 10th update above, Dean has posted v1.4.1 of Power Fractal with improved Intel-based Mac performance.
8-Core Temperature Monitor and KillaWatt Readings:
(added 4/25/2007)
"(8-Core Mac Pro w/4GB OEM RAM, 4 hard drives, ATI X1900 graphics card, Tempo SATA E2P PCI-e card)
Regarding your previous question about temperatures (using freewareTemperature Monitor)
Ambient Temperature: 71.6° F
After running Power Fractal for nearly 2 hours, Temperature Monitor reports the following maximum temperatures:
Ambient Air: 75.2°F (It heats the room quite nicely - unfortunately summer is coming!)
Here's a screen capture of the settings at the end of that nearly two hour run of Power Fractal tests:
(BTW - A Mac Pro owner (dual-core Xeon model) was surprised by the CPU temperatures above, but remember the 8-core
Mac Pros have quad-core Xeons with twice the cores per CPU chip as the dual-core Xeon based Mac Pros have.)
Also, power usage figures are as follows: (from a Kill-A-Watt meter)
Power up: Max 401 watts then drops to 311 watts
Login Window: 280 watts
During Login: Max 311 watts
Idle: 276 - 281 watts
Power Fractal: 562 - 584 watts
By the time I was finishing all of the above tests, the draw while running the max count of 1048576 for Power Fractal was peaking at about 565 watts - nearer the low end of the range when I first started running. Maybe I burnt in the heat sinks a bit. ;-)
I hope the information is useful. If you have any other tests you'd like me to run, let me know.
-Lew"
As an FYI - a previous post on the earlier Mac Pro memory page from a (dual-core Xeon) Mac Pro user w/OEM 512's noted a FBDIMM max temp of 84°C (183.2°F), although some severe stress tests of 1GB/2GB FBDimms have reported as high as 194°F.
More on 8-Core Power Fractal Performance (PPC/Rosetta vs Univ.) (**NOTE** - the results below were using v1.4 of Power Fractal, not the later v1.4.1 version with improved native Intel code performance. Version 1.4.1 is available now at http://daugerresearch.com/fractals/.)
Lew's latest notes on Power Fractal (v1.4) Performance in reply to a dual-core Xeon Mac Pro owner's notes doubting the Gigaflops he mentioned in his earlier comments on Tuesday (below).
(added 4/25/2007 - updated w/more notes)
"Some additional info on my 8-core:
- 4GB of OEM RAM
- Two 500 GB Maxtor drives
- One 500 GB Seagate drive
- One 750 GB drive
- ATI X1900 video card
- Tempo SATA E2P PCI-Express eSATA card
Here are the settings I was using in Power Fractal (v1.4): (Please note I'm using the PPC version NOT the Universal!)
Color Speed: 10
Maximum Count: 4096
Zoom Factor: 2
Gallery: Four Color Patchwork
Window Size: Fit Window to Main Screen (30" Cinema Display - 2560x1600)
Parallel: Run as Single Node
The rates aren't always exactly the same, but usually are pretty consistent. With those settings all 8 cores are used at max and I got a rate of 62489.0 MegaFlops, i.e. 62.4 GigaFlops - what 10 years ago would have definitely considered supercomputer performance. :-)
Changing Maximum Count to the maximum value maxes out all the cores and kept them maxed out for 1 hour, 56 minutes and 52 seconds when it reported a rate of 9206.2 MF.
Changing the Window Size to default (640x480) it ran for 12 minutes & 20 seconds and reported a rate of 17719.5 MF.
Going back to Window Size of Fit Window to Main Screen (2560x1600) I started playing with the Maximum Count value:
Power Fractal v1.4 Carbon/Rosetta (8-core Mac Pro)
And I got tired of running the test at that point! :-}
I did retry the universal version, full screen and max count of 4096, and it took 2.6 seconds and achieved 45723.5 MF. (that's worse than the PPC/Rosetta test numbers!?
I wonder what the author of Power Fractal would say about that. (Bugs?, lower precision math used under Rosetta? (Altivec only does single precision math IIRC, but that's not an issue as I found out later that PF only uses single precision.) Or does the Intel native version have some code inefficiencies?)
I suspect that Rosetta is making better use of the multiple cores than the universal version. (Does it show more cores used under Rosetta?)
When I was running the tests last night, Power Fractal running under Rosetta was using all 8 cores and had them all maxed out. So it seems to be handling the threading pretty well.
(A Quad-Core
Mac Pro owner asked if he ran the Univ. version at max count to see if it
also maxed out the Cores.)
When I run either the PPC (under rosetta) or the Universal version of Power Fractal it maxes all of the cores.
I don't have CS2 installed on the 8-core, so I haven't been able to see how well it makes use of the 8 cores. I don't think that Rosetta uses less precision on the math either.
And the universal version requires Pooch to use multiple processors (I don't think that's true. See below.) - which probably works fine with processes that run longer, but it certainly doesn't with short ones.
(The Power Fractal home page notes that Pooch is
only needed if you're using Clustered Machines (Running it on multiple machines in parallel).)
I suppose I could try running the universal version with Pooch and set the Maximum Count to the max setting and see if it runs faster. But since all of the processors appeared to be maxed out, I don't see how it could speed it up much. And 70 GigaFlops on a desktop machine is pretty speedy! :-)
Running Temperature Monitor at the same time knocks the speed down to 62012.0 MF. (see above for his Temperature Monitor readings)
(he later wrote)
Running either version, when I select "Automatically launch onto four local nodes" under the Parallel menu (instead of choosing "Run as a single node") I get the following message: (Note the "and other Machines on your network"...)
"Pooch could not be found. Please run the Pooch Installer on this and other machines on your network to install Pooch."
If I then click on the "Opps..." button, it goes ahead and runs as a single node.
(The Parallel menu is for running a cluster (of machines) setup. I get the same items under the Parallel menu on this single core/single CPU PowerBook G4. See the Power Fractal page for notes on Clustering/using Pooch to run in parallel. Here's a clip from the Power Fractal page.)
"To run this app on a single Intel- or PowerPC-based Mac, no additional
software is necessary.
To run this in parallel on Intel and PowerPC Macs, you will need Pooch.
Version 1.1 includes automated "Computational Grid" launching on a Mac
cluster running version 1.1 of Pooch.
See the Pooch Quick Start for instructions on configuring your Macs for parallel computing"
Pooch will actually run it as four jobs on the 8-core, but the overhead is such that it doesn't finish spawning the additional jobs until after the run would have been done without using Pooch. I suppose I could see what happens with my 8-core, G4, MacBook Pro, and Mac mini. :-D
Running with the straight default values for PPC and Universal the PPC completes in .2 seconds and achieves 54066.5 MegaFlops while the Universal completes in .3 seconds and achieves 36045 MegaFlops.
Why the difference and seemingly backwards results is beyond my level of expertise. I'm just reporting the results I get. Your mileage may vary. ;-} (Other readers with Intel-cpu macs noted the same thing
today - I wrote Dauger to ask for their take on why.) Hopefully Dean Dauger will be able to shed some light on the issue. (Update - see their reply above for notes on this.)
By the way, one thing I've noticed with Temperature Monitor is how much more accurate the sensors seem to be compared to the ones in a PowerBook. The ambient temperature is within half a degree of the digital thermometer I have in the room.
-Lew"
I wrote Dauger Research to ask if they could comment on the higher scores of the carbon version under Rosetta vs running the Universal version. (Update - see the author's reply above and later notes on the v1.4.1 update.)
Here's results with PF v1.4 from a Quad-Core (2x Dual-core Xeons) Mac Pro, who did get better results with the Universal version if set to the max count (16,777,216), but with lower counts/defaults Rosetta scored higher:
"
Here's my results:
Power Fractal v1.4 (Universal, run as Intel Native):
Max Count 4096 (Default): 20.2 Gigaflops Avg.
Max Count 16,777,216 (Max): 8.4 Gigaflops Avg.
Power Fractal v1.4 (Carbon, PPC under Rosetta):
Max Count 4096 (Default): 25.6 Gigaflops Avg, 26.8 Gigaflops peak.
Max Count 16,777,216 (Max): 8.1 Gigaflops Avg.
Max Count 4096 w/Altivec Disabled: 2.79 Gigaflops Avg...
-Chibi D."
8-Core Mac Pro user notes on CPU Usage/Pro Apps and Power Use
(from 4/24/2007 news page)
"...I'm gening some video for a DVD on it so that should give a fairly heavy and constant load. BTW, Final Cut Pro & Compressor use about 450% of the CPUs. I have You Control installed and there are 8 little CPU monitors in my menu bar. They all are running when I use compressor, but only at around 50% each. Activity Monitor shows Compressor using around 350% to 370% and Final Cut Pro using around 80% to 100%.
Power Fractals will pop max it out pretty well, but it is only for a few seconds. Even with the 30" Cinema display, it doesn't take long to fill the screen when it is doing 62 GigaFlops! :-)
(Note: he later wrote he was using the PPC version of Power Fractals (under rosetta) - which he later said was faster and scored higher than the Univ. version. See laer report/notes above.)
I have one of the KillaWatt meters and while the manual says that the 8-core can use up to 1200 watts, I haven't had it go over 500 watts yet. Actually, at full idle (not asleep) it runs at around 250 watts. When I hit it with Power Fractals it goes up to about 460 watts and then drops right back down.
-Lew K."
A (dual core Xeon) Mac Pro owner replied to some of the comments:
"
Just thought I'd drop in and put in a sidenote to Lew K.'s response (above). To get PowerFractal to really REALLY use his 8 cores, set the Maximum Count to the bottom-most setting (16 million) in the menu and then have him say the 8 core mac finishes right away. Trust me, it won't. It takes my 4 core Mac Pro a fair amount of time - in the range of 15-20 *minutes* to complete the image at the max setting with the default color speed. To give you an idea of how long it takes, the window is in the background as I type this email and only 10% has completed as of this sentence. ^_^
Not sure where he's getting the 62 Gigaflops from either, as the 4 core Mac Pro maxed out does significantly less than 20 gigaflops (default quick mode does roughly 20 gigaflops, maxed out does much less).
-Chibi
"
I wrote Lew to ask about this - see his later reply w/scores above.
OS X 10.4.9 Build Number, Notes on CPU reporting after 2007-004 Security Update:
I mentioned that new macs often ship with later builds of OS X than the public release of the same build. An 8-Core Mac Pro owner replied:
"The Mac Pro 8-Core does have a later build.
[Ocho:~$] sw_vers
ProductName: Mac OS X
ProductVersion: 10.4.9
BuildVersion: 8P4037
[Ocho:~$]
Standard 10.4.9 is reported as build 8P2137.
(You can also get the build number by clicking on the "About this Mac"
OS X version number.)
Applying 2007-004 Security Update causes "about this mac" to report "unknown" 2 x 3GHz but System Profiler and CHUD Processor shows proper information.
(Another 8-Core Mac Pro owner said he saw the same thing after the 004 security update - unknown processor reported in About this Mac.)
Cosmetic only it appears and no change in stability or performance (knock on wood!).
Good news is that it is "only" software or firmware issue probably, rather than hardware which would be more serious I guess.
I hope George is able to get his hands on 8-core. I think the switch to Intel and EFI may have been a difficult one for FirmTek.
Thanks for your site and keeping us informed.
-Gregory"
Another 8-Core Mac Pro owner to wrote to confirm the above:
"
I can confirm the two builds already on your site (8P2137 - ok with firmtek - and 8P4037 - no go with the sata board). I can also confirm that applying the security update I also have the "unknown" processor issue. :(
-Maurizio
"
Seritek 2SE2-E eSATA card in 8-core MacPro (Remember to install Drivers from CD) (Update: The next day he wrote that he'd forgotten about an Install CD with drivers for the Seritek card that have to be installed on the Mac Pro's boot drive.) His earlier mail follows:
(Updated comments first)
"Ok, it seems that the drivers were on the install CD. I mistakenly gave for granted that nothing had to be installed as this was the case with my older G5. It seems that with the MacPro this is not the case. George Rath wrote me and kindly explained that I was a little stupid... (The drivers used are FT_ATA_Sil3132E.kext
Sil3112DeviceNub.kext)
Anyway he sent me a newer firmware (5.2.5) just to let me think that I'm stupid, but not a total idiot...
(His earlier mail follows - before he had installed the missing drivers)
I have a new MacPro 8-core and the Seritek 2SE2-E. The card has firmware 5.2.0 (nov 18 2006) and cannot get recognized by the computer...
With the application "Expansion Slot Utility" I can see the board as "unknown other mass storage controller" but anyway it's not usable by the computer.
I've written to FirmTek but they are unaware of this issue and they have passed the question to George Rath who's in charge of the firmware. I can't really use the new mac until I get this working as I heavily rely upon external disks for my work. I would be happy to hear a second voice. Could you please ask if someone else can test the combo just to see if they work together and please report also firmware versions? Thank you in advance.
The card is perfectly ok with a previous 3GHz 2x2 (dual core) MacPro. Starting from a clone of my old MacPro (dual core) disk installed in the new quad core, the card works! If I use a fresh newly installed with factory disk, I cannot get the card working. I'll report this to FirmTek now, hope that they investigate. I'd frankly prefer to use a fresh install, not the clone of my old computer...
Best regards,
Maurizio C.
"
(Firmtek later replied with a reminder to install the drivers on CD.) I also wondered if there are any OS changes in the 8-core OS X build/install (shipped from Apple). Sometimes new macs ship with a later build (w/changes) than the public release of the same OS X version. (Update - see later post above with info on 8-core Mac Pro OS X 10.4.9 build version and comments on the 2007-004 security update.)
Another 8-Core Mac Pro owner wrote:
"
Concerning the problem that Maurizio C. had with the Seritek 2SE2-E card (turns out he just needed to install the drivers from CD.), I have an 8-core Mac Pro and here are two possible solutions:
1) I purchased a Sonnet PCI Express 2-port eSATA card from OWC that is working fine for me. (I assume he's using the Tempo SATA E2P.)
2) There are supposedly two unused SATA ports in Mac Pros (present since the first models.) and at least one company is making a plate with two eSATA sockets to replace an empty PCI slot plate on the back panel. There are cables to extend the two unused ports to the back panel ports. Unfortunately, I don't recall where I saw it. :-\
(He's referring to the Newer Tech cable kit, mentioned in the April 19, 2007 news. There's also an article here on Using the NT SATA Cable Kit with an External SATA dual-drive case kit.)
-Lew K.
"
Other Mac Pro related Articles: - See the