This document compares the performance of different file formats (Avro, Parquet, ORC, textfile) for storing large datasets in Hadoop. It conducts experiments loading datasets into tables using each format and measuring file size compression and query execution times. The results show that Parquet provides the most compact storage size, compressing data to about 1/3 the size of the original textfile. ORC and Parquet have similar performance for query execution, with ORC having a slightly smaller file size. In general, the binary formats (Avro, Parquet, ORC) outperform textfile for both storage efficiency and query performance.