It's not necessarily a bad idea, you just have to know to which end the patches are created. Collapsing more complex assembly code to less complex code, saving space and reducing execution time used to get a lot more respect when storage space was scarce and CPUs used to be far less powerful. Like, say, in the 1980'es and 1990'ies.
Yes, indeed, these were the "even older days" of computing. Back then, in the 6502-times, squeezing more program into less RAM was pretty much a necessity given that you had so little of it. I remember back then on the Atari (yes, the Atari 800XL, same chip designer, different company), the file management system (back then called "DOS") was bootstrapped from disk, took probably 5K of your precious RAM space, and had pretty limited capabilities. Plus it took time to bootstrap that 5K (it wasn't a 1541, so it wasn't as bad as on the C64, after all.)
Indeed, one can try to rewrite the whole thing, throw out the less-used part of ROM space (for a parallel port interface that came so late to the market that no devices were ever made to fit into this port), and replace the newly available 3K of ROM with a more powerful version of the 5K DOS, and cleanup the math stuff on the way. For such extremely tiny systems, this type of hobby did make sense because it was a noticeable improvement (as in: 5K more for your programs from the total of 40K available). Not that it was commercially viable - it wasn't.
Anyhow, byte counting stopped making sense, already when 512K were the norm, and priorities changed. As soon as projects go bigger, one starts to notice that there is no benefit in squeezing out each possible byte, or each possible optimization. There is too much code to look at, and problems are typically related to maintain the full construction rather than to make it fast.
As Olsen already said, either execution time is not critical because I/O or human input is limiting the speed, or 80% of the program time is spend in less than 20% of the program. In such a case, the 20% are then hand-tuned, probably written in assembly. For 68K, I did this myself. Nowadays, not even than anymore, we had a specialist for that in the company when I worked on problems that required this type of activity. Even then, it turns out that the really critical part is not even the algorithm itself, but to keep the data in cache, i.e. construct the algorithm around the "worker" such that data is ideally pipe-lined, and that again was done in a high-level language (C++).
To keep the story short, even today the use of Assembly even for optimization is diminishing. There are hot-spots where you have to use it, but if speed is essential, you typically want to be as flexible as possible to re-arrange your data structures to allow for fast algorithms, and to organize data such that the data access pattern fits to the CPU organization - and you don't get this flexibility in Assembler. It sounds weird, but a high-level language and more code can sometimes make an algorithm faster.
But anyhow, I confess I did byte counting in the really old days, two generations of computers ahead, and yes, it created a good deal of spaghetti code, though requirements were quite a bit different.
http://www.xl-project.com/download/os++.tar.gzIt's part of becoming a good engineer to learn which tools you need to reach your goal, and which tools to pick for a specific use case, and foremost to understand what the problem actually is (this is more complicated than one may guess). Ill defined problems create ill program architecture. Not saying that I'm the perfect software engineer - I have no formal education in this sector - but at least I learned a bit of by failing often enough.