blob: bd6d738f00b1ba528b5184a6d2054ecd34ca48ec [file] [log] [blame]
Phillip Loughere1621932014-08-08 05:30:01 +01001SQUASHFS 1.1 - A squashed read-only filesystem for Linux
Phillip Lougher9e37ac02014-08-08 05:15:38 +01002
Phillip Loughere1621932014-08-08 05:30:01 +01003Copyright 2003 Phillip Lougher (phillip@lougher.demon.co.uk)
Phillip Lougher9e37ac02014-08-08 05:15:38 +01004
5Released under the GPL licence (version 2 or later).
6
7Squashfs is a highly compressed read-only filesystem for Linux (kernel 2.4.x).
8It uses zlib compression to compress both files, inodes and directories.
9Inodes in the system are very small and all blocks are packed to minimise
10data overhead. Block sizes greater than 4K are supported up to a maximum
11of 32K.
12
13Squashfs is intended for general read-only filesystem use, for archival
14use (i.e. in cases where a .tar.gz file may be used), and in constrained
15block device/memory systems (e.g. embedded systems) where low overhead is
16needed.
17
18The filesystem is currently stable, and has been tested on PowerPC, i586
19and Sparc architectures.
20
21Squashfs overview
22-----------------
23
241. Data, inodes and directories are compressed.
25
262. Squashfs stores full uid/gids (32 bits), and file creation time.
27
283. Files up to 2^32 bytes are supported. Filesystems can be up to
29 2^32 bytes.
30
314. Inode and directory data are highly compacted, and packed on byte
32 boundaries. Each compressed inode is on average 8 bytes in length
33 (the exact length varies on file type, i.e. regular file, directory,
34 symbolic link, and block/char device inodes have different sizes).
35
365. Squashfs can use block sizes up to 32K (the default size is 32K).
37 Using 32K blocks achieves greater compression ratios than the normal
38 4K block size.
39
406. File duplicates are detected and removed.
41
Phillip Loughere1621932014-08-08 05:30:01 +0100427. Both big and little endian architectures are supported. Squashfs can
43 mount filesystems created on different byte order machines.
Phillip Lougher9e37ac02014-08-08 05:15:38 +010044
45
46mksquashfs
47----------
48
49As squashfs is a read-only filesystem, the mksquashfs program must be used to
50create populated squashfs filesystems.
51
Phillip Loughere1621932014-08-08 05:30:01 +010052SYNTAX:./mksquashfs source1 source2 ... dest [options] [-e list of exclude dirs/files]
Phillip Lougher9e37ac02014-08-08 05:15:38 +010053
54Options are
55-info print files written to filesystem
56-b block size size of blocks in filesystem, default 32768
57-noI -noInodeCompression do not compress inode table
58-noD -noDataCompression do not compress data blocks
Phillip Loughere1621932014-08-08 05:30:01 +010059-nopad do not pad filesystem to a multiple of 4K
Phillip Lougher9e37ac02014-08-08 05:15:38 +010060-check_data add checkdata for greater filesystem integrity checks
61-le create a little endian filesystem
62-be create a big endian filesystem
63
Phillip Loughere1621932014-08-08 05:30:01 +010064Source1 source2 ... are the source directories/files containing the
65files/directories that will form the squashfs filesystem. If a single
66directory is specified (i.e. mksquashfs source output_fs) the squashfs
67filesystem will consist of that directory, with the top-level root
68directory corresponding to the source directory.
69
70If multiple source directories or files are specified, mksquashfs will merge
71the specified sources into a single filesystem, with the root directory
72containing each of the source files/directories. The name of each directory
73entry will be the basename of the source path. If more than one source
74entry maps to the same name, the conflicts are named xxx_1, xxx_2, etc. where
75xxx is the original name, i.e.
76
77%mksquashfs /home/phillip/test /tmp/source2 source3 /tmp/test output_fs
78
79Will create a filesystem with the root directory containing directory
80entries test source2 source3 test_1
81
82Multiple sources allow filesystems to be generated without needing to
83copy all source files into a common directory. This simplifies creating
84filesystems.
Phillip Lougher9e37ac02014-08-08 05:15:38 +010085
86Dest is the destination where the squashfs filesystem will be written. This
87can either be a conventional file or a block device. If the file doesn't exist
88it will be created, if it does exist it will be truncated.
89
Phillip Loughere1621932014-08-08 05:30:01 +010090The -e option allows files/directories to be specified which are
91excluded from the output filesystem. If an exclude file/directory is
92absolute (i.e. prefixed with /, ../, or ./) the entry is treated as
93absolute, however, if an exclude file/directory is relative, it is
94treated as being relative to each of the sources in turn, i.e.
95
96%mksquashfs /tmp/source1 source2 output_fs -e ex1 /tmp/source1/ex2 out/ex3
97
98Will generate exclude files /tmp/source1/ex2, /tmp/source1/ex1, source2/ex1,
99/tmp/source1/out/ex3 and source2/out/ex3.
100
101The -e exclude option is usefully used in archiving the entire filesystem,
102where it is wished to avoid archiving /proc, and the filesystem being
103generated, i.e.
104
105%mksquashfs / /tmp/root.sqsh -e proc /tmp/root.sqsh
106
Phillip Lougher9e37ac02014-08-08 05:15:38 +0100107The -info option displays the files/directories as they are compressed and
108added to the filesystem. The compression percentage achieved is printed, with
109the original uncompressed size. If the compression percentage is listed as
1100% it means the file is a duplicate.
111
112The -b option allows the block size to be selected, this can be either
113512, 1024, 2048, 4096, 8192, 16384, or 32768 bytes.
114
115The -noI and -noD options (also -noInodeCompression and -noDataCompression)
116can be used to force mksquashfs to not compress inodes/directories and data
117respectively. Giving both options generates an uncompressed filesystem.
118
119The -le and -be options can be used to force mksquashfs to generate a little
120endian or big endian filesystem. Normally mksquashfs will generate a
Phillip Loughere1621932014-08-08 05:30:01 +0100121filesystem in the host byte order. Squashfs, for portability, will
122mount different ordered filesystems (i.e. it can mount big endian filesystems
123running on a little endian machine), but these options can be used for
124greater optimisation.
Phillip Lougher9e37ac02014-08-08 05:15:38 +0100125
Phillip Loughere1621932014-08-08 05:30:01 +0100126The -nopad option informs mksquashfs to not pad the filesystem to a 4K multiple.
127This is performed by default to enable the output filesystem file to be mounted
128by loopback, which requires files to be a 4K multiple. If the filesystem is
129being written to a block device, or is to be stored in a bootimage, the extra
130pad bytes are not needed.
Phillip Lougher9e37ac02014-08-08 05:15:38 +0100131
132Filesystem layout
133-----------------
134
135Brief filesystem design notes follow.
136
137A squashfs filesystem consists of five parts, packed together on a byte alignment:
138
139 ---------------
140| superblock |
141|---------------|
142| data |
143| blocks |
144|---------------|
145| inodes |
146|---------------|
147| directories |
148|---------------|
149| uid/gid |
150| lookup table |
151 ---------------
152
153Compressed data blocks are written to the filesystem as files are read from
154the source directory, and checked for duplicates. Once all file data has been
155written the completed inode, directory and uid/gid lookup tables are written.
156
157Metadata
158--------
159
160Metadata (inodes and directories) are compressed in 8Kbyte blocks. Each
161compressed block is prefixed by a two byte length, the top bit is set if the
162block is uncompressed. A block will be uncompressed if the -noI option is set,
163or if the compressed block was larger than the uncompressed block.
164
165Inodes are packed into the metadata blocks, and are not aligned to block
166boundaries, therefore inodes overlap compressed blocks. An inode is
167identified by a two field tuple <start address of compressed block : offset
168into de-compressed block>.
169
170Inode contents vary depending on the file type. The base inode consists of:
171
172base inode:
173Inode type
174Mode
175uid index
176gid index
177
178The inode type is 4 bits in size, and the mode is 12 bits.
179
180The uid and gid indexes are 4 bits in length. Ordinarily, this will allow 16
181unique indexes into the uid table. To minimise overhead, the uid index is
182used in conjunction with the spare bit in the file type to form a 48 entry
183index as follows:
184
185inode type 1 - 5: uid index = uid
186inode type 5 -10: uid index = 16 + uid
187inode type 11 - 15: uid index = 32 + uid
188
189In this way 48 unique uids are supported using 4 bits, minimising data inode
190overhead. The 4 bit gid index is used to index into a 15 entry gid table.
191Gid index 15 is used to indicate that the gid is the same as the uid.
192This prevents the 15 entry gid table filling up with the common case where
193the uid/gid is the same.
194
195The data contents of symbolic links are stored immediately after the symbolic
196link inode, inside the inode table. This allows the normally small symbolic
197link to be compressed as part of the inode table, achieving much greater
198compression than if the symbolic link was compressed individually.
199
200Similarly, the block index for regular files is stored immediately after the
201regular file inode. The block index is a list of block lengths (two bytes
202each), rather than block addresses, saving two bytes per block. The block
203address for a given block is computed by the summation of the previous
204block lengths. This takes advantage of the fact that the blocks making up a
205file are stored contiguously in the filesystem. The top bit of each block
206length is set if the block is uncompressed, either because the -noD option is
207set, or if the compressed block was larger than the uncompressed block.
208
209Directories
210-----------
211
212Like inodes, directories are packed into the metadata blocks, and are not
213aligned on block boundaries, therefore directories can overlap compressed
214blocks. A directory is, again, identified by a two field tuple
215<start address of compressed block containing directory start : offset
216into de-compressed block>.
217
218Directories are organised in a slightly complex way, and are not simply
219a list of file names and inode tuples. The organisation takes advantage of the
220observation that in most cases, the inodes of the files in the directory
221will be in the same compressed metadata block, and therefore, the
222inode tuples will have the same start block.
223
224Directories are therefore organised in a two level list, a directory
225header containing the shared start block value, and a sequence of
226directory entries, each of which share the shared start block. A
227new directory header is written once/if the inode start block
228changes. The directory header/directory entry list is repeated as many times
229as necessary. The organisation is as follows:
230
231directory_header:
232count (8 bits)
233inode start block (24 bits)
234
235directory entry: * count
236inode offset (13 bits)
237inode type (3 bits)
238filename size (8 bits)
239filename
240
241This organisation saves on average 3 bytes per filename.
242
243File data
244---------
245
246File data is compressed on a block by block basis and written to the
247filesystem. The filesystem supports up to 32K blocks, which achieves
248greater compression ratios than the Linux 4K page size.
249
250The disadvantage with using greater than 4K blocks (and the reason why
251most filesystems do not), is that the VFS reads data in 4K pages.
252The filesystem reads and decompresses a larger block containing that page
253(e.g. 32K). However, only 4K can be returned to the VFS, resulting in a
254very inefficient filesystem, as 28K must be thrown away. Squashfs,
255solves this problem by explicitly pushing the extra pages into the page
256cache.