Put jars in your code, instead of using libjars. Your jar file gets pushed to every node. Keep your mapreduce code small; don't send large libraries because you run large VMs on nodes.
Hook for logging and status. OK to use log4j or stdout/stderr and then read from jobtracker for each process.
What mappers and reducers write to
MapReduceBase, Mapper, Reducer.
JobClient and JobConf are the Job Management interfaces. JobClient specifiesy main and args. JobConf sets memory, number of mappers, reducers, etc.
Use this for line-based input
Sequence files allow for multiple files in one 64MB block
mapred is old but functional. mapreduce is future version.