mapred is old but functional. mapreduce is future version.
Sequence files allow for multiple files in one 64MB block
Use this for line-based input
JobClient and JobConf are the Job Management interfaces. JobClient specifiesy main and args. JobConf sets memory, number of mappers, reducers, etc.
MapReduceBase, Mapper, Reducer.
What mappers and reducers write to
Hook for logging and status. OK to use log4j or stdout/stderr and then read from jobtracker for each process.
Put jars in your code, instead of using libjars. Your jar file gets pushed to every node. Keep your mapreduce code small; don't send large libraries because you run large VMs on nodes.