Next
Previous
Contents
Well, I wouldn't want to interfere with what you're doing,
but here is some advice from hard-earned experience.
The advantages of Assembly
Assembly can express very low-level things:
- you can access machine-dependent registers and I/O.
- you can control the exact behavior of code
in critical sections that might otherwise involve deadlock
between multiple software threads or hardware devices.
- you can break the conventions of your usual compiler,
which might allow some optimizations
(like temporarily breaking rules about memory allocation,
threading, calling conventions, etc).
- you can build interfaces between code fragments
using incompatible such conventions
(e.g. produced by different compilers,
or separated by a low-level interface).
- you can get access to unusual programming modes of your processor
(e.g. 16 bit mode to interface startup, firmware, or legacy code
on Intel PCs)
- you can produce reasonably fast code for tight loops
to cope with a bad non-optimizing compiler
(but then, there are free optimizing compilers available!)
- you can produce code where
(but only on CPUs with known instruction timings,
which generally excludes all current ....
- you can produce hand-optimized code
that's perfectly tuned for your particular hardware setup,
though not to anyone else's.
- you can write some code for your new language's
optimizing compiler
(that's something few will ever do, and even they, not often).
The disadvantages of Assembly
Assembly is a very low-level language
(the lowest above hand-coding the binary instruction patterns).
This means
- it's long and tedious to write initially,
- it's very bug-prone,
- your bugs will be very difficult to chase,
- it's very difficult to understand and modify,
i.e. to maintain.
- the result is very non-portable to other architectures,
existing or future,
- your code will be optimized only for a certain implementation
of a same architecture:
for instance, among Intel-compatible platforms,
each CPU design and its variations
(relative latency, throughput, and capacity,
of processing units, caches, RAM, bus, disks,
presence of FPU, MMX, 3DNOW, SIMD extensions, etc)
implies potentially completely different optimization techniques.
CPU designs already include:
Intel 386, 486, Pentium, PPro, Pentium II, Pentium III;
Cyrix 5x86, 6x86; AMD K5, K6 (K6-2, K6-III), K7 (Athlon).
New designs keep popping up, so don't expect either this listing
or your code to be up-to-date.
- you spend more time on a few details,
and can't focus on small and large algorithmic design,
that are known to bring the largest part of the speed up.
[e.g. you might spend some time building very fast
list/array manipulation primitives in assembly;
only a hash table would have sped up your program much more;
or, in another context, a binary tree;
or some high-level structure distributed over a cluster of CPUs]
- a small change in algorithmic design might completely
invalidate all your existing assembly code.
So that either you're ready (and able) to rewrite it all,
or you're tied to a particular algorithmic design;
- On code that ain't too far from what's in standard benchmarks,
commercial optimizing compilers outperform hand-coded assembly
(well, that's less true on the x86 architecture
than on RISC architectures,
and perhaps less true for widely available/free compilers;
anyway, for typical C code, GCC is fairly good);
- And in any case, as says moderator John Levine on
comp.compilers,
"compilers make it a lot easier to use complex data structures,
and compilers don't get bored halfway through
and generate reliably pretty good code."
They will also correctly propagate code transformations
throughout the whole (huge) program
when optimizing code between procedures and module boundaries.
Assessment
All in all, you might find that though using assembly is sometimes needed,
and might even be useful in a few cases where it is not, you'll want to:
- minimize the use of assembly code,
- encapsulate this code in well-defined interfaces
- have your assembly code automatically generated
from patterns expressed in a higher-level language
than assembly (e.g. GCC inline assembly macros).
- have automatic tools translate these programs
into assembly code
- have this code be optimized if possible
- All of the above,
i.e. write (an extension to) an optimizing compiler back-end.
Even in cases when assembly is needed (e.g. OS development),
you'll find that not so much of it is,
and that the above principles hold.
See the Linux kernel sources concerning this:
as little assembly as needed,
resulting in a fast, reliable, portable, maintainable OS.
Even a successful game like DOOM was almost massively written in C,
with a tiny part only being written in assembly for speed up.
General procedure to achieve efficient code
As says Charles Fiterman on
comp.compilers
about human vs computer-generated assembly code,
"
The human should always win and here is why.
- First the human writes the whole thing in a high level language.
- Second he profiles it to find the hot spots where it spends its time.
- Third he has the compiler produce assembly for those small
sections of code.
- Fourth he hand tunes them looking for tiny improvements over
the machine generated code.
The human wins because he can use the machine.
"
Languages with optimizing compilers
Languages like ObjectiveCAML, SML, CommonLISP, Scheme, ADA, Pascal, C, C++,
among others, all have free optimizing compilers
that will optimize the bulk of your programs,
and often do better than hand-coded assembly even for tight loops,
while allowing you to focus on higher-level details,
and without forbidding you to grab
a few percent of extra performance in the above-mentioned way,
once you've reached a stable design.
Of course, there are also commercial optimizing compilers
for most of these languages, too!
Some languages have compilers that produce C code,
which can be further optimized by a C compiler:
LISP, Scheme, Perl, and many other.
Speed is fairly good.
General procedure to speed your code up
As for speeding code up,
you should do it only for parts of a program
that a profiling tool has consistently identified
as being a performance bottleneck.
Hence, if you identify some code portion as being too slow, you should
- first try to use a better algorithm;
- then try to compile it rather than interpret it;
- then try to enable and tweak optimization from your compiler;
- then give the compiler hints about how to optimize
(typing information in LISP; register usage with GCC;
lots of options in most compilers, etc).
- then possibly fallback to assembly programming
Finally, before you end up writing assembly,
you should inspect generated code,
to check that the problem really is with bad code generation,
as this might really not be the case:
compiler-generated code might be better than what you'd have written,
particularly on modern multi-pipelined architectures!
Slow parts of a program might be intrinsically so.
Biggest problems on modern architectures with fast processors
are due to delays from memory access, cache-misses, TLB-misses,
and page-faults;
register optimization becomes useless,
and you'll more profitably re-think data structures and threading
to achieve better locality in memory access.
Perhaps a completely different approach to the problem might help, then.
Inspecting compiler-generated code
There are many reasons to inspect compiler-generated assembly code.
Here are what you'll do with such code:
- check whether generated code
can be obviously enhanced with hand-coded assembly
(or by tweaking compiler switches)
- when that's the case,
start from generated code and modify it
instead of starting from scratch
- more generally, use generated code as stubs to modify,
which at least gets right the way
your assembly routines interface to the external world
- track down bugs in your compiler (hopefully rarer)
The standard way to have assembly code be generated
is to invoke your compiler with the -S
flag.
This works with most Unix compilers,
including the GNU C Compiler (GCC), but YMMV.
As for GCC, it will produce more understandable assembly code with
the -fverbose-asm
command-line option.
Of course, if you want to get good assembly code,
don't forget your usual optimization options and hints!
In general case you don't need to use assembly language in Linux programming.
Unlike DOS, you do not have to write Linux drivers in assembly
(well, actually you can do it if you really want).
And with modern optimizing compilers,
if you care of speed optimization for different CPU's,
it's much simpler to write in C.
However, if you're reading this,
you might have some reason to use assembly instead of C/C++.
You may need to use assembly, or you may want to use assembly.
Shortly, main practical reasons why you may need to get into Linux assembly
are small code and libc independence.
Non-practical (and most often) reason is being just an old crazy hacker,
who has twenty years old habit of doing everything in assembly language.
Also, if you're porting Linux to some embedded hardware
you can be quite short at size of whole system:
you need to fit kernel, libc
and all that stuff of (file|find|text|sh|etc.) utils
into several hundreds of kilobytes,
and every kilobyte costs much.
So, one of the ways you've got is to rewrite some
(or all) parts of system in assembly,
and this will really save you a lot of space.
For instance, a simple httpd
written in assembly
can take less than 600 bytes;
you can fit a webserver, consisting of kernel and httpd,
in 400 KB or less... Think about it.
Next
Previous
Contents