@Hyperspeed
Ultimately, as you know, computers deal with binary data, ie a simple 0/1 state.
An individual bit is not useful beyond representing a basic "on/off" or "true/false" type propery.
In order to represent numbers, bits are grouped together. If you have N bits together, they can represent 2^N (that's 2 to the power N) possible states. So, an 8-bit byte can represent 2^8 == 256 possible states. This leads to the notion of the binary number system and arithmetic, where each bit represents a power of 2 (just as you have units, tens, hundreds etc in decimal arithmetic).
Provided all you ever want to do is deal with whole numbers, your basic binary arithmetic is fine for most purposes. This is the "integer" arithmetic that your classic CPU performs.
Unfortunately a lot of real world calculation involves imprecise numbers that can span enormous scales of magnitude. If you are familiar with scientific notation, you'll know what I am talking about. Integer arithmetic becomes hugely impractical for this type of calculation.
If you think of scientific notation, where you have m x 10^e where m is the (signed) mantissa, from 1 to 9.9999r and e is the (signed) exponent, you can see that you could come up with a similar notation in binary: m x 2^e. In this scale, m is from 0.5 to 0.999r. The IEEE came up with a binary representation for this notation where a total of 32 bits can be used to represent a value to roughly six decimal places from around 10^-37 to about 10^+38. There's a 64-bit version that has a much larger exponent range and can store values to more decimal places.
Working with these binary formats is a bit more complex than the simple integer case. As not every pattern of 32 or 64 bits are a valid representation, you have to ensure that your values are formatted properly; adding, subtracting, multiply and divide are all more complex than for integer mathematics. In short, dealing with this number format requires a lot of integer operations to perform the required mathematical operation on the floating point number. Consequently, dealing with them is rather slow.
An FPU is a dedicated processor that knows how to work with these formats directly. Instead of the CPU having to grind through a list of instructions to add a pair of floating point numbers, the FPU can do it directly, often in a lot fewer cycles than the CPU can execute the equivalent sequence of operations.
The 68882 didn't stop at providing the basic add, subract, divide, multiply and compare operations for floating point numbers. It also provided more complex operations such as square root, sine, cosine, tangent, log, exponentiation and other "transcendal" functions. They typically weren't very quick (some could take over 100 cycles to execute) but they were still faster than the 68020/68030 it was designed to support could typically manage.
With the 68040/68060, all the transcendal stuff was dropped out with the sole exception of the square root (this is a mathematical primitive that is extremely important and also not that expensive to evaluate). Instead, the designers focused on making the basic operations as fast as possible and emulating all the missing stuff.