Welcome, Guest. Please login or register.

Author Topic: New ppc board by Acube/A-Eon: A1222 "Tabor"  (Read 50239 times)

Description:

0 Members and 1 Guest are viewing this topic.

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show all replies
Re: New ppc board by Acube/A-Eon: A1222 "Tabor"
« Reply #14 from previous page: October 19, 2015, 05:25:17 PM »
Quote from: Spectre660;797651

By the way Trever did confirm in his Amiwest 2015 presentation the the T series cpus were not available when the Tabor design was started.


Maybe we need a poll. Which of the following choices best applies to the P1022 PPC CPU choice without a compatible PPC FPU for the Tabor motherboard?

1) poor decision
2) bad decision
3) desperation

Is the writing on the wall not clear enough? It says, "PPC is dying and the remaining choices are slower, more expensive and/or handicapped". Freescale being bought out makes it easier for the new parent company to say we didn't promise anything regarding PPC. If there is not desperation yet then there could be at any time and very likely will be in the future. If not wanting to give up PPC then diversifying outside of PPC could allow Hyperion and/or A-Eon to survive the demise of PPC (if it is not already too late). Maybe Amiga Inc. understands and is patiently waiting to get their now "developed" intellectual property back for free.

Quote from: Iggy;797653

I rather suspected that the e500 core would outperform an Applied Micro core.
So it is a move forward, especially in consideration of the fact that it is a dual core cpu.

That 30% will more than cover any deficit that fpu emulation will cost.


30% faster integer performance will make some applications feel snappier while others with heavy floating point use (like Blender) will likely run at a fraction of the speed of a PPC with standard FPU. It's kind of like hiring a motivated worker who is missing one arm. He may be a little faster than average at some jobs but is going to have major trouble doing some work.
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show all replies
Re: New ppc board by Acube/A-Eon: A1222 "Tabor"
« Reply #15 on: October 19, 2015, 08:42:24 PM »
Quote from: Spectre660;797671
I have run AmigOS 4.1 update6 on Sam440ep-Flex,FE on Sam460ex.
Linux on both Sams and the Tabor.
I have run Both the Special FPE Debian on the Tabor and the regular
Debian with the emulated floating point on the Tabor.
The Tabor is much faster than the Sams under Linux .


Linux is probably using both cores of the Tabor CPU so it would be significantly faster at integer math than the SAM. FPU trapping probably isn't going to be noticeable for light floating point use while being devastating to heavy use floating point programs. Run a MFlops benchmark under Linux using the traps on the Tabor board and let us know how it performs. The AMCC (SAM) 460EX gives 2.0 MFLOPS/MHz. I would be surprised if the P1022 can do 10% of the SAM's MFLOPS with trapping. Traps have a huge amount of overhead (20+ cycles per trap) and allow no superscalar parallelism.

Quote from: Iggy;797672
Well, one vote for 'poor decision' but not a lethal one.


I'll wait to place my vote but I have low expectations. My scale would go something like this:

0) acceptable decision as better than half the FLOPS of the SAM 460
1) poor decision as 1/2 - 1/4 the FLOPS of the SAM 460
2) bad decision as 1/4 - 1/8 the FLOPS of the SAM 460
3) desperation as worse than 1/8 the FLOPS of the SAM 460

Quote from: Iggy;797672

Is there an NG port of Blender?


Andy Broad did a port of Blender to AmigaOS 4.

http://www.broad.ology.org.uk/amiga/blender/

Quote from: Iggy;797672

I can't picture it working well on a single core system with memory limitations.


What makes you think the 68k can't do multi-core? It is not much more difficult than copy and paste of an existing FPGA CPU core. The FPGA has an advantage of being able to deal with problems like the executive base forbid counter in a more compatible way. I believe the 68k in FPGA has a better chance of preserving compatibility while exceeding the 4GB addressing barrier. The PPC can not use 64 bit pointers and keep compatibility with AmigaOS 3 or AmigaOS 4 because structure sizes would change. Better code density (smaller code size) is an advantage with a memory extension like XMS for the PC. How many Amiga 68k users complain about 128MB of memory not being enough compared to Amiga PPC users complaining about 512MB not being enough? I wonder how much memory the Tabor FPU trapping support code will take up?
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show all replies
Re: New ppc board by Acube/A-Eon: A1222 "Tabor"
« Reply #16 on: October 19, 2015, 10:29:41 PM »
Quote from: itix;797692
In modern computing code density and code size have very little to do with memory requirements. PPC code requires 50% more space (versus 68k code) but we are still talking few hundred kilobytes.

It is the data that counts.

It depends on the programs (streaming data needs little memory and no DCache for example) and sometimes 68k "data" is smaller also because of less alignment restrictions. Code density is more important to processor performance where the 68k can have 40%-50% more code in the ICache and to bandwidths which allow code to be transferred 40%-50% faster. DCaches are rarely larger than ICaches despite some programs using much more data. Memory is cheap now days but even tens of MBs are important to the AmigaOS where 64 bit pointers will break compatibility. Code density is important enough that most modern 32 bit mobile and embedded devices use ARM with Thumb 2 or Android's Dalvik byte code which have good but inferior code density to the 68k. IMO, it makes sense for the Amiga to leverage all the advantages of compatibility and a small footprint with the 68k using 32 bit for the low end. The high end could break compatibility converting to 64 bit pointers while adding SMP and using a sandbox for AmigaOS 3 and 4 compatibility but I don't think there is enough market for it currently and especially with the price/performance and future prospects of PPC. The same technology used to make an enhanced 68k CPU (and learn from it) could be used to make a new 68k like 64 bit SuperCISC ISA and CPU design (if necessary) which is better than x86_64 (an average ISA at best while the CISC advantages giving the most powerful consumer processors in the world are continuously overlooked). It would require some investment but at least the Amiga could control its destiny, standardize and innovate instead of being dependent on the last small customer PPC manufacturer and aging embedded designs.
« Last Edit: October 19, 2015, 11:27:48 PM by matthey »
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show all replies
Re: New ppc board by Acube/A-Eon: A1222 "Tabor"
« Reply #17 on: October 20, 2015, 06:10:29 AM »
Quote from: broadblues;797698
Ridiculous thing to say, modern memory usage is all about data, code is like 1% of it.

eg SketchBlock is 3.5MB unstripped (plus some small libraries and filters) but as I'm working in quad float pixels I can max out on 1.5Gb of free ram when dealing with modern digital camera images.


The AmigaOS libraries and devices are probably 95% code. They have to be counted too. The AmigaOS requirements:

AmigaOS 3.1 no requirement, 2MB recommended
AmigaOS 3.5 4MB required, 8MB recommended
AmigaOS 3.9 6MB required, 8MB recommended
AmigaOS 4.0 64MB required
AmigaOS 4.1 Classic 96MB required

Many classic users consider AmigaOS 3.9 to be bloated but PPC requires another ~60MB of memory. Why does PPC need another ~60MB for "data"?

What kind of floating point range and precision are needed to require quad precision floating point in SketchBlock? If only increased range is required then extended precision should be adequate (both have 15 bits of exponent) but extended precision only has 11 more bits of precision than double precision while quad precision has 60 more bits of precision.

Quote from: broadblues;797698

blender needs about 100MB to start but only 10% of that is code.


Data used once, infrequently and streamed shouldn't count as it isn't necessary to be persistent in memory. Code in the AmigaOS which is persistent should be counted as code used.
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show all replies
Re: New ppc board by Acube/A-Eon: A1222 "Tabor"
« Reply #18 on: October 20, 2015, 08:23:30 PM »
Quote from: itix;797705
It doesnt really matter when my PPC is still much faster at executing code (68k or PPC) than a real 68060.

Even an affordable FPGA 68k CPU core can be faster at executing integer code than a 68060. It just shows how old the 68060 design and fab process is. The most modern PPC processors blow away the original Pentium processor but we know the PPC vs x86 situation today.

Quote from: itix;797705
With 68k code you dont save tens of MBs. You could port the latest OWB to 68k and it still would not run well with 128 MB. 68060 is still too slow to run the latest webkit engine, 128 MB is still too little to run the latest webkit engine.

With smaller memory then less memory is saved. I would add 1GB-2GB to a 68k standalone board as it is relatively cheap. The AmigaOS (mostly code) overhead must be counted in determining free memory for applications and this is substantially higher for the PPC as I have shown.

Quote from: itix;797705
But the small footprint is just due to lack of features. Lack of Unicode support, no built-in USB stack, rudimentary GUI toolkit (unless you install MUI but then it is not small footprint anymore), lack of text antialiasing and truetype font support, simple 4 colour icons (versus true color PNG icons) i.e. it is stuck in 90s.

The AmigaOS uses shared libraries which can be dynamically loaded and flushed as needed. The AmigaOS allows many settings (preferences) which can reduce memory use. I use a 800x600x16 RTG Workbench with AmigaOS 3.9, use PFS with lots of partitions and buffers, use Peter K's icon.library which allows planar, Glow, PNG and AmigaOS 4 style icons, have a system friendly truetype engine, use MUI and ReAction GUIs, etc. but only a few MB of memory are taken at boot and I can use most of the features above together in 16-32MB of memory.

Quote from: itix;797705
I dont see point in new super 68k but I get your idea. The PPC is obviously stuck in year 2005 forever.

A new clean 64 bit 68k like big endian CISC ISA could allow easier Amiga 68k and PPC migration while improving processor efficiency. The x86_64 ISA and processors would probably be close enough for migrating to if big endian was supported. ARMv8 is similar to PPC and bi-endian but the hardware is usually customized for customers which license and control their synthesizable FPGA code in a similar way as I have proposed for the Amiga but with a processor team developing CPU cores (instead of paying license fees to ARM). There are many advantages to this route as I have pointed out but it requires investment and quantity production.

Quote from: broadblues;797711
Not sure where you get your 60 MB number from but:

3.1 plain backdrop 4 colour icons. 24 pixels square icons, really small HD at most with few disk buffers and no caching

4.1 full colour backdrop 32bit icons 64 bit square, potentially teratbytes of HD space, many more disk buffers per partition, other caching (SFS has write through caching I think FFS2 has cache hooks that may be enabled) etc etc.

For the AmigaOS 4.0 "at least 64 MB RAM" requirement, my reference was:

http://www.vesalia.de/e_amigaos4.htm

Most later versions of AmigaOS 4.x are for specific hardware and do not list a minimum memory requirement. Your comparison compares a minimal AmigaOS 3.1 setup to a well equipped AmigaOS 4.1 setup. The minimum requirement mentions "200MB free hard disk space" so disk buffers aren't going to be outrageous. Can AmigaOS 4.x use 8 bit gfx without a backdrop at least? What happened to the advantages of a scalable AmigaOS with a small footprint?

Quote from: Yasu;797739
According to the AmigaOS 4 developer "Cyborg", lack of FPU is a problem, they will fix it with emulation, it will be slow but they have a plan to eventually speed it up a lot with a JIT emulation. Sounds sensible. Maybe we can stop arguing now?

Emulation (and especially JIT) of FPU instructions instead of trapping should substantially improve floating point performance but it is a complex solution prone to errors. How much development time is going to be wasted by using a non-standard FPU? How much processing power and memory usage will JIT take for a low end motherboard? Limited Amiga software development would benefit from standardized and reduced hardware options but we seem to be headed in the other direction.

Quote from: aperez;797744
Listen, trollmaster...statements such as this do not contribute anything of value to this discussion. Give it a rest. Don't you have something more constructive you could be doing?

I could do some constructive Amiga work but the Amiga situation takes away my motivation.

Quote from: aperez;797744
I'd like to steer this conversation back towards reality for a moment... If someone buys a dual-core PPC32 machine and expects to be running Blender on it, well, that's just crazy. This isn't a machine for that. The recommended hardware for Blender at the moment is a 64-bit, quad-core CPU with 8GB RAM. This is not the type of hardware you would EVER seriously consider running Blender on.

Let's get back to reality then and remember that AmigaOS 4 only supports one CPU core and can't take advantage of 64 bit addressing without breaking Amiga compatibility. I guess you should tell Andy to forget his AmigaOS port of Blender then.
« Last Edit: October 20, 2015, 08:34:57 PM by matthey »
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show all replies
Re: New ppc board by Acube/A-Eon: A1222 "Tabor"
« Reply #19 on: October 21, 2015, 05:35:14 PM »
Quote from: Spectre660;797827
Interesting paper of floating point emulation.
http://www.ll.mit.edu/HPEC/agendas/proc08/Day1/11-Day1-PosterDemoA-Spetka-abstract.pdf


That is the same article I linked to in post #91 of this thread. I'm pleased that someone read it and found it "interesting". The 3D bar graph shows that the fully trapped FPU instructions (FastFPE and NWFPE) take at least twice as long to execute on average as compiling with software FP which is considered slow compared to hardware FP. It looks like there is not much difference on the bar graph for basic FP calculations but this is probably due to lack of resolution for smaller measurements. Complex FP instructions (in hardware or software emulation) usually use FP (often polynomial approximation) equations using basic FP math (FADD, FSUB, FMUL, FDIV, FABS, FSQRT, FINT, etc.) so the complex instructions are composites of many simple calculations but without using traps. Programs using heavy FP calculations would use traps more frequently so they would have more overhead than the complex FP instructions but using the complex FP instructions would reduce this additional overhead :).
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show all replies
Re: New ppc board by Acube/A-Eon: A1222 "Tabor"
« Reply #20 on: October 21, 2015, 07:10:58 PM »
Quote from: Spectre660;797831
So the end results are based on the actual method used.


Yes, depending on restrictions and requirements of the method used. IEEE compatibility and similar accuracy and behavior as the standard PPC FPU are requirements for emulating or trapping the PPC FPU without problems. There are many ways to get floating point support with the P1022 CPU with widely varying performance.

1) Best: Recompiling software to use the P1022 GPR FPU should give good FP performance but the executable will only work on the Tabor board. This is hardware FP.
2) Good: Recompiling software to use software floating point should give fair FP performance and should work on standard PPC hardware with or without an FPU. This is software FP usually using a floating point library provided by the compiler.
3) Poor: Emulation or JIT emulation of the standard PPC FPU gives fair performance without recompiling but with much development time, overhead and resources used for the emulation and possibly bugs or problems due to the complexity. The results of this method were not shown in the article you linked.
4) Bad: Trapping all the standard PPC FPU instructions and emulation of the standard FPU has major overhead for each trapped instruction but recompiling is not needed and it is simpler than (JIT) emulation. The results of this method would be the FastFPE and NWFPE method in the article you linked.
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show all replies
Re: New ppc board by Acube/A-Eon: A1222 "Tabor"
« Reply #21 on: October 21, 2015, 11:27:57 PM »
Quote from: nicholas;797861
Perhaps having the entire OS as shipped by Hyperion recompiled specifically for this CPU and then just having trapping for third party binaries that the end user installs might be a good compromise?

Most of the AmigaOS does not need floating point so only a few modules need to be recompiled. A modified version of AmigaOS is already needed for standard PPC FPU trap support and emulation. This is not a problem.

Quote from: nicholas;797861
Plus third party devs can always ship two binaries with their future releases as was the case with 040/060 specific builds in the past.

Right. It's more work and time for developers but this is the best option when there is so much difference in floating point performance between full trapping and recompiling. The partial FPU trapping of the 68040 or 68060 for 6888x instructions usually provides better performance than the 6888x and usually with less than a 20% performance loss compared to recompiling for the 68040 or 68060 FPU.
« Last Edit: October 21, 2015, 11:35:22 PM by matthey »
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show all replies
Re: New ppc board by Acube/A-Eon: A1222 "Tabor"
« Reply #22 on: October 24, 2015, 06:43:40 PM »
Quote from: LiveForIt;798023
Most of OS does not use FLOAT, but most of the programs that run on the OS that play sound and video use float, and most of this are ported from Linux, compiled with static linking.


The most popular free sound and video players are based on mplayer and ffmpeg. There are all integer codecs/plugins. The 68k mpega.library, Riva and MooVid show what fast integer codecs can do. My 68060@75MHz plays MP3s and MPEG videos (up to 640x480) without problems despite lacking 32x32=64 bit integer multiply in hardware and lacking fast DSP like instructions which accelerate integer codecs. I wouldn't be surprised if the Apollo core in FPGA is able to play 720p video using an integer codec.

Quote from: LiveForIt;798023

I tell you right now, I'm not going to waste my time down grading software to support slow CPU's without FPU. Having to jump to math library, simple fmuls and fdivs, has overhead, jumps and branches have overhead, registers has to be put in stack, and restored, and return values has be set and so on, I think the math library idea from 90's was wacky idea to begin with. And is not a good direction to be going.


I agree with what you are saying here. There are developers who will not want to mess with recompiling software for a non-standard CPU. Standard hardware allows for more optimal software without the need for fat binaries, patching and creating hardware specific codecs/plugins and/or recompiling for specific hardware. The amount of developer time wasted for multiple compiles, patching and debugging on non-standard hardware is often underestimated and especially deadly to a small market with limited development resources. Standard hardware floating point support using direct floating point instructions in the code is needed for competitive modern general purpose computing. Without this, the Amiga will fall further and further behind. Embedded PPC processors are already behind the curve and the situation is likely to get worse.

Quote from: LiveForIt;798023

It's more likely be using more double floats in my programs, because its not good idea to mix and match int with float, because of overhead; a FPU while its integrated in CPU this days. Actually works as independent processor unit, there is no way to move a value from a CPU register into a FPU register without storing it on RAM, so makes more sense to write code for the FPU or write the code for CPU instructions. Do not do casting between float and int, or at least as little as possible.


A superscalar CPU has independent units which work in parallel as you say. This makes it advantageous to work on mixed (integer, FPU, SIMD) instructions in the code as otherwise the unused units are idle. The compiler instruction scheduler will schedule these instructions to keep as many CPU units busy as possible even if integer and floating point parts of the source are separated. You are also correct that transferring data between CPU units can have significant overhead. Pipeline bubbles can be generated when transferring data between units using instructions or the DCache with longer pipelines giving longer stalls. Depending on the CPU design, these stalls may be avoided by not touching the transferred data for several cycles. The instruction scheduler may improve these situations also although the results can vary. In general, mixing float and integer is good but the programmer should try to reduce the number of float<->int conversions.
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show all replies
Re: New ppc board by Acube/A-Eon: A1222 "Tabor"
« Reply #23 on: October 25, 2015, 06:29:27 PM »
Quote from: LiveForIt;798086
Sure you can reserve a few gate array's for DCT, Deblocking and color space conversion, accelerate speed on FPGA. However I do not think you, be able to play H264 video buy just simulating normal CPU instructions. But FPGA's are pretty expensive, how many gate will you need to do that on FPGA, and will it cost you more than buying a ASIC from Broadcom, VIA or Intel.

Accelerating video processing by SIMD and DSP like instructions in the CPU has advantages for many types of of programs. CPU non-general purpose video support like Intel's Quick Sync probably gives the most acceleration but I don't know if it necessary. The following is a benchmark of encoding a 449 MB, four-minute 1080p file to 1024×768 (CPU is a Core i7 3770).

Software encoding = 172 seconds
AMD Radeon HD 6870 = 86 seconds
Nvidia GeForce GTX 570 = 83 seconds
Quick Sync Video in i7 = 22 seconds

This shows how effective CPU specific acceleration can be but it also may benefit from being a popular hardware standard. Video acceleration on these gfx boards goes across the gfx bus which adds latency and the support is less standard. Integrating standard gfx on the motherboard could potentially reduce this overhead. I would like to see the Amiga go back to custom standardized hardware including the CPU and integrated graphics.
« Last Edit: October 25, 2015, 07:26:20 PM by matthey »
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show all replies
Re: New ppc board by Acube/A-Eon: A1222 "Tabor"
« Reply #24 on: October 26, 2015, 03:56:45 PM »
Quote from: nicholas;798148
Who in their right mind does anything on an Amiga system in this day and age?

Because we can, no other reason is necessary ;-)


We should be able to do general purpose computing on the Amiga even if it slower than the competition. The challenge of a small market like the Amiga is how to make it easier, cheaper and faster so it can be used more.

Quote from: bison;798154
There was a discussion on EAB about that.  Can't find it right now...


This thread?

http://eab.abime.net/showthread.php?t=79940