|
| 1 | +--- |
| 2 | +date: 2018-06-02 |
| 3 | +title: "UnRAR an Archive" |
| 4 | +description: How to create and extract rar archives with Python and containers |
| 5 | +resources: |
| 6 | + - name: "Comparison of Archive Formats" |
| 7 | + link: https://en.wikipedia.org/wiki/Comparison_of_archive_formats#Containers_and_compression |
| 8 | + - name: "How to Open, Extract, and Create RAR files in Linux" |
| 9 | + link: https://www.tecmint.com/how-to-open-extract-and-create-rar-files-in-linux/ |
| 10 | + - name: "RAR on Wikipedia" |
| 11 | + link: https://en.wikipedia.org/wiki/RAR_(file_format) |
| 12 | + - name: "Other Python Modules for RAR (Stack Overflow)" |
| 13 | + link: https://stackoverflow.com/questions/17614467/how-can-unrar-a-file-with-python |
| 14 | + - name: singularityhub/rar Github repository |
| 15 | + link: https://www.github.com/singularityhub/rar |
| 16 | + - name: Singularity Hub Container |
| 17 | + link: https://www.singularity-hub.org/collections/1080 |
| 18 | + - name: "Singularity Containers" |
| 19 | + link: https://singularityware.github.io |
| 20 | +type: Document |
| 21 | +set: clusters |
| 22 | +set_order: 6 |
| 23 | +tags: [linux,python,archive] |
| 24 | +--- |
| 25 | + |
| 26 | +Today we are going to use Python to extract a <a href="https://en.wikipedia.org/wiki/RAR_(file_format)" target="_blank">RAR archive</a>. You may have never heard of this format, or vaguely remember something called |
| 27 | +<a href="https://en.wikipedia.org/wiki/WinRAR" target="_blank">WinRAR</a>. Keep in mind there are a |
| 28 | +<a href="https://en.wikipedia.org/wiki/Comparison_of_archive_formats#Containers_and_compression" target="_blank">crapton</a> |
| 29 | +of options, and if you get to choose, you should choose optimally for your problem at hand. Today we will focus on RAR files and work with them using Python and then with a Singularity container. Let's get started! |
| 30 | + |
| 31 | +## Why Containers or Python? |
| 32 | + |
| 33 | +> You are working on a shared resource where you can't install system tools for working with RAR, but you can install Python packages, or use Singularity containers! |
| 34 | +
|
| 35 | +We are going to use some standard <a href="https://www.tecmint.com/how-to-open-extract-and-create-rar-files-in-linux/" target="_blank"> command line linux tools</a> to create a dummy archive to extract, and then demonstrate doing the extraction |
| 36 | +in Python. Yes, we _could_ just interact with the files using the linux tools, but we are under the assumption that you can't do these installations. Perhaps you are a researcher and have downloaded RAR archives from a web address, and you also want to interact with the contents in Python. |
| 37 | + |
| 38 | +## Create an Archive |
| 39 | +If you ever wanted to create a RAR archive on linux (and please consider <a href="https://en.wikipedia.org/wiki/Comparison_of_archive_formats#Containers_and_compression" target="_blank"> others first </a> you could do this with "rar" and "unrar" (cue dinosaur "RAAAWR!" |
| 40 | + |
| 41 | +```bash |
| 42 | +# Install in Debian/Ubuntu |
| 43 | +sudo apt-get install -y rar unrar |
| 44 | +``` |
| 45 | + |
| 46 | +Let's make a silly folder of useless files to turn into an archive. |
| 47 | + |
| 48 | +```bash |
| 49 | +mkdir -p noodles |
| 50 | +echo "This is what you live on in graduate school." >> noodles/ramen.txt |
| 51 | +echo "This is a meaty noodle that probably your Dad likes." >> noodles/udon.txt |
| 52 | +echo "Garfield approved, best with a napkin." >> noodles/lasagna.txt |
| 53 | +mkdir -p noodles/sauce |
| 54 | +touch noodles/sauce/marinara |
| 55 | +touch noodles/sauce/cheese |
| 56 | +touch noodles/sauce/alfredo |
| 57 | +``` |
| 58 | + |
| 59 | +Oh yeah, we have a nice little thing going on here! |
| 60 | + |
| 61 | +```bash |
| 62 | +$ tree noodles/ |
| 63 | +noodles/ |
| 64 | +├── lasagna.txt |
| 65 | +├── ramen.txt |
| 66 | +├── sauce |
| 67 | +│ ├── alfredo |
| 68 | +│ ├── cheese |
| 69 | +│ └── marinara |
| 70 | +└── udon.txt |
| 71 | + |
| 72 | +1 directory, 6 files |
| 73 | +``` |
| 74 | + |
| 75 | +We can now create the archive with "rar" |
| 76 | + |
| 77 | +```bash |
| 78 | +$ rar a noodles.rar noodles |
| 79 | + |
| 80 | +RAR 5.30 beta 2 Copyright (c) 1993-2015 Alexander Roshal 4 Aug 2015 |
| 81 | +Trial version Type RAR -? for help |
| 82 | + |
| 83 | +Evaluation copy. Please register. |
| 84 | + |
| 85 | +Creating archive noodles.rar |
| 86 | + |
| 87 | +Adding noodles/sauce/alfredo OK |
| 88 | +Adding noodles/sauce/cheese OK |
| 89 | +Adding noodles/sauce/marinara OK |
| 90 | +Adding noodles/udon.txt OK |
| 91 | +Adding noodles/ramen.txt OK |
| 92 | +Adding noodles/lasagna.txt OK |
| 93 | +Adding noodles/sauce OK |
| 94 | +Adding noodles 1% |
| 95 | +Done |
| 96 | +``` |
| 97 | + |
| 98 | +Great! We now have an archive to work with. We can guess that the `a` says to "add files" to an archive. You should be comfortable with reading the man (manual) pages to learn more about rar, try this: |
| 99 | + |
| 100 | +```bash |
| 101 | +$ man rar |
| 102 | +``` |
| 103 | + |
| 104 | +## Extract in Python |
| 105 | + |
| 106 | +There are actually a <a href="https://stackoverflow.com/questions/17614467/how-can-unrar-a-file-with-python" target="_blank">couple of ways to do this</a>, and I'll use a module that I know called "patool." |
| 107 | +<a href="https://pypi.org/project/patool/" target="_blank">Check it out</a> because it handles |
| 108 | +much more kinds of archives than rar. Here is how to install it: |
| 109 | + |
| 110 | +```bash |
| 111 | + |
| 112 | +pip install patool |
| 113 | + |
| 114 | +# On a shared resource |
| 115 | +pip install patool --user |
| 116 | +``` |
| 117 | + |
| 118 | +Now let's open up Python! Note that the noodles.rar is in our present working directory. |
| 119 | + |
| 120 | +```python |
| 121 | +rarfile = 'noodles.rar' |
| 122 | +``` |
| 123 | + |
| 124 | +The simplest thing to do, in order to extract the archive to the present working |
| 125 | +directory (and let's create a new folder for it first) would look something like this: |
| 126 | + |
| 127 | +```python |
| 128 | + |
| 129 | +from patoolib import extract_archive |
| 130 | +import os |
| 131 | +extract_to = 'new-noodles' |
| 132 | +os.mkdir(extract_to) |
| 133 | +extract_archive(rarfile, outdir=extract_to) |
| 134 | + |
| 135 | +patool: Extracting noodles.rar ... |
| 136 | +patool: running /usr/bin/rar x -- /home/vanessa/Documents/Dropbox/Code/shub/containers/rar/noodles.rar |
| 137 | +patool: with cwd='new-noodles' |
| 138 | +patool: ... noodles.rar extracted to `new-noodles'. |
| 139 | +``` |
| 140 | + |
| 141 | +Now you probably want to parse over them. How can you do that? Let's write a quick function! This |
| 142 | +would return a list: |
| 143 | + |
| 144 | +```python |
| 145 | +def list_files(base): |
| 146 | + files = [] |
| 147 | + for root, dirnames, filenames in os.walk(base): |
| 148 | + for filename in filenames: |
| 149 | + files.append(os.path.join(root, filename)) |
| 150 | + return files |
| 151 | +``` |
| 152 | + |
| 153 | +Test it out! |
| 154 | + |
| 155 | +``` |
| 156 | +
|
| 157 | +In [11]: list_files(extract_to) |
| 158 | +Out[11]: |
| 159 | +['new-noodles/noodles/udon.txt', |
| 160 | + 'new-noodles/noodles/ramen.txt', |
| 161 | + 'new-noodles/noodles/lasagna.txt', |
| 162 | + 'new-noodles/noodles/sauce/alfredo', |
| 163 | + 'new-noodles/noodles/sauce/cheese', |
| 164 | + 'new-noodles/noodles/sauce/marinara'] |
| 165 | +``` |
| 166 | + |
| 167 | +And here is a function that would return an iterator. You might want this for |
| 168 | +larger listings for more efficient parsing. |
| 169 | + |
| 170 | +```python |
| 171 | + |
| 172 | +def iter_files(base): |
| 173 | + for root, dirnames, filenames in os.walk(base): |
| 174 | + for filename in filenames: |
| 175 | + yield os.path.join(root, filename) |
| 176 | +``` |
| 177 | +``` |
| 178 | +for filename in iter_files(extract_to): |
| 179 | + ...: print('I found a file %s!' %filename) |
| 180 | + ...: |
| 181 | +I found a file new-noodles/noodles/udon.txt! |
| 182 | +I found a file new-noodles/noodles/ramen.txt! |
| 183 | +I found a file new-noodles/noodles/lasagna.txt! |
| 184 | +I found a file new-noodles/noodles/sauce/alfredo! |
| 185 | +I found a file new-noodles/noodles/sauce/cheese! |
| 186 | +I found a file new-noodles/noodles/sauce/marinara! |
| 187 | +``` |
| 188 | + |
| 189 | +The paths above are relative. You would want to use `os.path.abspath(os.path.join(root, filename))` |
| 190 | +for a full path. It depends what you are trying to do. |
| 191 | + |
| 192 | +```python |
| 193 | +def list_fullpaths(base): |
| 194 | + files = [] |
| 195 | + for root, dirnames, filenames in os.walk(base): |
| 196 | + for filename in filenames: |
| 197 | + files.append(os.path.abspath(os.path.join(root, filename))) |
| 198 | + return files |
| 199 | +``` |
| 200 | +``` |
| 201 | +In [15]: list_fullpaths(extract_to) |
| 202 | +Out[15]: |
| 203 | +['/home/vanessa/Documents/Dropbox/Code/shub/containers/rar/new-noodles/noodles/udon.txt', |
| 204 | + '/home/vanessa/Documents/Dropbox/Code/shub/containers/rar/new-noodles/noodles/ramen.txt', |
| 205 | + '/home/vanessa/Documents/Dropbox/Code/shub/containers/rar/new-noodles/noodles/lasagna.txt', |
| 206 | + '/home/vanessa/Documents/Dropbox/Code/shub/containers/rar/new-noodles/noodles/sauce/alfredo', |
| 207 | + '/home/vanessa/Documents/Dropbox/Code/shub/containers/rar/new-noodles/noodles/sauce/cheese', |
| 208 | + '/home/vanessa/Documents/Dropbox/Code/shub/containers/rar/new-noodles/noodles/sauce/marinara'] |
| 209 | +``` |
| 210 | + |
| 211 | +## Rar-ing with a Container |
| 212 | +Let's say you don't want to deal with Python, but you still can't install rar or unrar. You can use a container! |
| 213 | +If you aren't familar with Singularity, it's a container technology (like Docker) that works on a shared resource. |
| 214 | +You can install it with <a href="https://singularityware.github.io/install-linux" target="_blank">these instructions</a>. |
| 215 | + |
| 216 | +[](https://singularity-hub.org/collections/1080) |
| 217 | + |
| 218 | +The container is hosted on Singularity Hub, and built from the <a href="https://www.github.com/singularityhub/rar" target="_blank">repository here</a>. |
| 219 | + |
| 220 | + |
| 221 | +```bash |
| 222 | +$ singularity pull --name rar.simg shub://singularityhub/rar |
| 223 | +``` |
| 224 | + |
| 225 | + |
| 226 | +If you are extracting on a shared resource, make sure to export your `SINGULARITY_CACHEDIR` first, as |
| 227 | +pulling to `$HOME` can fill up the quota almost immediately `#fatcontainers`. |
| 228 | + |
| 229 | +```bash |
| 230 | + export SINGULARITY_CACHEDIR=$SCRATCH/.singularity |
| 231 | +$ singularity pull --name rar.simg shub://singularityhub/rar |
| 232 | +``` |
| 233 | + |
| 234 | +You can always ask the container for help before blindly running it. |
| 235 | + |
| 236 | +```bash |
| 237 | +$ singularity help rar.simg |
| 238 | + |
| 239 | + |
| 240 | + This container provides the rar and unrar utilities. Running the container |
| 241 | + as is is akin to running the rar utility on Linux: |
| 242 | + ./rar.simg --help |
| 243 | + |
| 244 | + If you want to create or extract an archive, you can also use one of the apps |
| 245 | + provided: |
| 246 | + |
| 247 | + singularity apps rar.simg |
| 248 | + create |
| 249 | + extract |
| 250 | + |
| 251 | + Or ask for help for usage for one: |
| 252 | + |
| 253 | + $ singularity help --app create rar.simg |
| 254 | + $ singularity help --app extract rar.simg |
| 255 | + |
| 256 | +``` |
| 257 | + |
| 258 | +## Create and Extract An Archive |
| 259 | +Here is how to create an archive using the Singularity container: |
| 260 | + |
| 261 | +```bash |
| 262 | +$ singularity run --app create rar.simg archive.rar noodles/ |
| 263 | +``` |
| 264 | + |
| 265 | +And now extract the same one, but somewhere else. |
| 266 | + |
| 267 | +```bash |
| 268 | +$ singularity run --app extract rar.simg archive.rar another-noodles |
| 269 | +``` |
| 270 | + |
| 271 | +## Advanced Usage |
| 272 | +You can interact with unrar and rar in the containers directly if you use "exec". |
| 273 | +Here are the same commands, but we are calling the executables directly! |
| 274 | + |
| 275 | +```bash |
| 276 | +# Create |
| 277 | +$ singularity exec rar.simg rar a new-archive.rar noodles/ |
| 278 | + |
| 279 | +# Extract |
| 280 | +$ singularity exec rar.simg unrar x new-archive.rar more-noodles/ |
| 281 | + |
| 282 | +UNRAR 5.30 beta 2 freeware Copyright (c) 1993-2015 Alexander Roshal |
| 283 | + |
| 284 | + |
| 285 | +Extracting from new-archive.rar |
| 286 | + |
| 287 | +Creating more-noodles OK |
| 288 | +Creating more-noodles/noodles OK |
| 289 | +Extracting more-noodles/noodles/udon.txt OK |
| 290 | +Extracting more-noodles/noodles/ramen.txt OK |
| 291 | +Extracting more-noodles/noodles/lasagna.txt OK |
| 292 | +All OK |
| 293 | +``` |
| 294 | + |
| 295 | +Finally, if you want to be working inside the image, you can use "shell" |
| 296 | + |
| 297 | +```bash |
| 298 | +$ singularity shell rar.simg |
| 299 | +Singularity: Invoking an interactive shell within container... |
| 300 | + |
| 301 | +Singularity rar.simg:~/Documents/Dropbox/Code/shub/containers/rar> which rar |
| 302 | +/usr/bin/rar |
| 303 | +Singularity rar.simg:~/Documents/Dropbox/Code/shub/containers/rar> which unrar |
| 304 | +/usr/bin/unrar |
| 305 | +``` |
| 306 | + |
| 307 | +Did you enjoy this post? Please <a href="https://github.com/vsoch/lessons" target="_blank">star the repository on Github</a> to show support! If you have a question for the dinosaur debugger, or would like to request a lesson, you can |
| 308 | +<a href="https://github.com/vsoch/lessons/issues" target="_blank"> do that here.</a> |
| 309 | + |
| 310 | +Can you make this post better? <a href="{{ site.repo }}/issues" target="_blank">Submit an issue</a> or |
| 311 | +better yet, make the edit yourself with a pull request <a href="{{ site.repo }}/issues" target="_blank">to the repository!</a>. |
0 commit comments