Skip to content

Commit dda62b2

Browse files
committed
adding rar RAWR post!
1 parent d9da642 commit dda62b2

File tree

3 files changed

+321
-0
lines changed

3 files changed

+321
-0
lines changed
Lines changed: 311 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,311 @@
1+
---
2+
date: 2018-06-02
3+
title: "UnRAR an Archive"
4+
description: How to create and extract rar archives with Python and containers
5+
resources:
6+
- name: "Comparison of Archive Formats"
7+
link: https://en.wikipedia.org/wiki/Comparison_of_archive_formats#Containers_and_compression
8+
- name: "How to Open, Extract, and Create RAR files in Linux"
9+
link: https://www.tecmint.com/how-to-open-extract-and-create-rar-files-in-linux/
10+
- name: "RAR on Wikipedia"
11+
link: https://en.wikipedia.org/wiki/RAR_(file_format)
12+
- name: "Other Python Modules for RAR (Stack Overflow)"
13+
link: https://stackoverflow.com/questions/17614467/how-can-unrar-a-file-with-python
14+
- name: singularityhub/rar Github repository
15+
link: https://www.github.com/singularityhub/rar
16+
- name: Singularity Hub Container
17+
link: https://www.singularity-hub.org/collections/1080
18+
- name: "Singularity Containers"
19+
link: https://singularityware.github.io
20+
type: Document
21+
set: clusters
22+
set_order: 6
23+
tags: [linux,python,archive]
24+
---
25+
26+
Today we are going to use Python to extract a <a href="https://en.wikipedia.org/wiki/RAR_(file_format)" target="_blank">RAR archive</a>. You may have never heard of this format, or vaguely remember something called
27+
<a href="https://en.wikipedia.org/wiki/WinRAR" target="_blank">WinRAR</a>. Keep in mind there are a
28+
<a href="https://en.wikipedia.org/wiki/Comparison_of_archive_formats#Containers_and_compression" target="_blank">crapton</a>
29+
of options, and if you get to choose, you should choose optimally for your problem at hand. Today we will focus on RAR files and work with them using Python and then with a Singularity container. Let's get started!
30+
31+
## Why Containers or Python?
32+
33+
> You are working on a shared resource where you can't install system tools for working with RAR, but you can install Python packages, or use Singularity containers!
34+
35+
We are going to use some standard <a href="https://www.tecmint.com/how-to-open-extract-and-create-rar-files-in-linux/" target="_blank"> command line linux tools</a> to create a dummy archive to extract, and then demonstrate doing the extraction
36+
in Python. Yes, we _could_ just interact with the files using the linux tools, but we are under the assumption that you can't do these installations. Perhaps you are a researcher and have downloaded RAR archives from a web address, and you also want to interact with the contents in Python.
37+
38+
## Create an Archive
39+
If you ever wanted to create a RAR archive on linux (and please consider <a href="https://en.wikipedia.org/wiki/Comparison_of_archive_formats#Containers_and_compression" target="_blank"> others first </a> you could do this with "rar" and "unrar" (cue dinosaur "RAAAWR!"
40+
41+
```bash
42+
# Install in Debian/Ubuntu
43+
sudo apt-get install -y rar unrar
44+
```
45+
46+
Let's make a silly folder of useless files to turn into an archive.
47+
48+
```bash
49+
mkdir -p noodles
50+
echo "This is what you live on in graduate school." >> noodles/ramen.txt
51+
echo "This is a meaty noodle that probably your Dad likes." >> noodles/udon.txt
52+
echo "Garfield approved, best with a napkin." >> noodles/lasagna.txt
53+
mkdir -p noodles/sauce
54+
touch noodles/sauce/marinara
55+
touch noodles/sauce/cheese
56+
touch noodles/sauce/alfredo
57+
```
58+
59+
Oh yeah, we have a nice little thing going on here!
60+
61+
```bash
62+
$ tree noodles/
63+
noodles/
64+
├── lasagna.txt
65+
├── ramen.txt
66+
├── sauce
67+
│   ├── alfredo
68+
│   ├── cheese
69+
│   └── marinara
70+
└── udon.txt
71+
72+
1 directory, 6 files
73+
```
74+
75+
We can now create the archive with "rar"
76+
77+
```bash
78+
$ rar a noodles.rar noodles
79+
80+
RAR 5.30 beta 2 Copyright (c) 1993-2015 Alexander Roshal 4 Aug 2015
81+
Trial version Type RAR -? for help
82+
83+
Evaluation copy. Please register.
84+
85+
Creating archive noodles.rar
86+
87+
Adding noodles/sauce/alfredo OK
88+
Adding noodles/sauce/cheese OK
89+
Adding noodles/sauce/marinara OK
90+
Adding noodles/udon.txt OK
91+
Adding noodles/ramen.txt OK
92+
Adding noodles/lasagna.txt OK
93+
Adding noodles/sauce OK
94+
Adding noodles 1%
95+
Done
96+
```
97+
98+
Great! We now have an archive to work with. We can guess that the `a` says to "add files" to an archive. You should be comfortable with reading the man (manual) pages to learn more about rar, try this:
99+
100+
```bash
101+
$ man rar
102+
```
103+
104+
## Extract in Python
105+
106+
There are actually a <a href="https://stackoverflow.com/questions/17614467/how-can-unrar-a-file-with-python" target="_blank">couple of ways to do this</a>, and I'll use a module that I know called "patool."
107+
<a href="https://pypi.org/project/patool/" target="_blank">Check it out</a> because it handles
108+
much more kinds of archives than rar. Here is how to install it:
109+
110+
```bash
111+
112+
pip install patool
113+
114+
# On a shared resource
115+
pip install patool --user
116+
```
117+
118+
Now let's open up Python! Note that the noodles.rar is in our present working directory.
119+
120+
```python
121+
rarfile = 'noodles.rar'
122+
```
123+
124+
The simplest thing to do, in order to extract the archive to the present working
125+
directory (and let's create a new folder for it first) would look something like this:
126+
127+
```python
128+
129+
from patoolib import extract_archive
130+
import os
131+
extract_to = 'new-noodles'
132+
os.mkdir(extract_to)
133+
extract_archive(rarfile, outdir=extract_to)
134+
135+
patool: Extracting noodles.rar ...
136+
patool: running /usr/bin/rar x -- /home/vanessa/Documents/Dropbox/Code/shub/containers/rar/noodles.rar
137+
patool: with cwd='new-noodles'
138+
patool: ... noodles.rar extracted to `new-noodles'.
139+
```
140+
141+
Now you probably want to parse over them. How can you do that? Let's write a quick function! This
142+
would return a list:
143+
144+
```python
145+
def list_files(base):
146+
files = []
147+
for root, dirnames, filenames in os.walk(base):
148+
for filename in filenames:
149+
files.append(os.path.join(root, filename))
150+
return files
151+
```
152+
153+
Test it out!
154+
155+
```
156+
157+
In [11]: list_files(extract_to)
158+
Out[11]:
159+
['new-noodles/noodles/udon.txt',
160+
'new-noodles/noodles/ramen.txt',
161+
'new-noodles/noodles/lasagna.txt',
162+
'new-noodles/noodles/sauce/alfredo',
163+
'new-noodles/noodles/sauce/cheese',
164+
'new-noodles/noodles/sauce/marinara']
165+
```
166+
167+
And here is a function that would return an iterator. You might want this for
168+
larger listings for more efficient parsing.
169+
170+
```python
171+
172+
def iter_files(base):
173+
for root, dirnames, filenames in os.walk(base):
174+
for filename in filenames:
175+
yield os.path.join(root, filename)
176+
```
177+
```
178+
for filename in iter_files(extract_to):
179+
...: print('I found a file %s!' %filename)
180+
...:
181+
I found a file new-noodles/noodles/udon.txt!
182+
I found a file new-noodles/noodles/ramen.txt!
183+
I found a file new-noodles/noodles/lasagna.txt!
184+
I found a file new-noodles/noodles/sauce/alfredo!
185+
I found a file new-noodles/noodles/sauce/cheese!
186+
I found a file new-noodles/noodles/sauce/marinara!
187+
```
188+
189+
The paths above are relative. You would want to use `os.path.abspath(os.path.join(root, filename))`
190+
for a full path. It depends what you are trying to do.
191+
192+
```python
193+
def list_fullpaths(base):
194+
files = []
195+
for root, dirnames, filenames in os.walk(base):
196+
for filename in filenames:
197+
files.append(os.path.abspath(os.path.join(root, filename)))
198+
return files
199+
```
200+
```
201+
In [15]: list_fullpaths(extract_to)
202+
Out[15]:
203+
['/home/vanessa/Documents/Dropbox/Code/shub/containers/rar/new-noodles/noodles/udon.txt',
204+
'/home/vanessa/Documents/Dropbox/Code/shub/containers/rar/new-noodles/noodles/ramen.txt',
205+
'/home/vanessa/Documents/Dropbox/Code/shub/containers/rar/new-noodles/noodles/lasagna.txt',
206+
'/home/vanessa/Documents/Dropbox/Code/shub/containers/rar/new-noodles/noodles/sauce/alfredo',
207+
'/home/vanessa/Documents/Dropbox/Code/shub/containers/rar/new-noodles/noodles/sauce/cheese',
208+
'/home/vanessa/Documents/Dropbox/Code/shub/containers/rar/new-noodles/noodles/sauce/marinara']
209+
```
210+
211+
## Rar-ing with a Container
212+
Let's say you don't want to deal with Python, but you still can't install rar or unrar. You can use a container!
213+
If you aren't familar with Singularity, it's a container technology (like Docker) that works on a shared resource.
214+
You can install it with <a href="https://singularityware.github.io/install-linux" target="_blank">these instructions</a>.
215+
216+
[![https://www.singularity-hub.org/static/img/hosted-singularity--hub-%23e32929.svg](https://www.singularity-hub.org/static/img/hosted-singularity--hub-%23e32929.svg)](https://singularity-hub.org/collections/1080)
217+
218+
The container is hosted on Singularity Hub, and built from the <a href="https://www.github.com/singularityhub/rar" target="_blank">repository here</a>.
219+
220+
221+
```bash
222+
$ singularity pull --name rar.simg shub://singularityhub/rar
223+
```
224+
225+
226+
If you are extracting on a shared resource, make sure to export your `SINGULARITY_CACHEDIR` first, as
227+
pulling to `$HOME` can fill up the quota almost immediately `#fatcontainers`.
228+
229+
```bash
230+
export SINGULARITY_CACHEDIR=$SCRATCH/.singularity
231+
$ singularity pull --name rar.simg shub://singularityhub/rar
232+
```
233+
234+
You can always ask the container for help before blindly running it.
235+
236+
```bash
237+
$ singularity help rar.simg
238+
239+
240+
This container provides the rar and unrar utilities. Running the container
241+
as is is akin to running the rar utility on Linux:
242+
./rar.simg --help
243+
244+
If you want to create or extract an archive, you can also use one of the apps
245+
provided:
246+
247+
singularity apps rar.simg
248+
create
249+
extract
250+
251+
Or ask for help for usage for one:
252+
253+
$ singularity help --app create rar.simg
254+
$ singularity help --app extract rar.simg
255+
256+
```
257+
258+
## Create and Extract An Archive
259+
Here is how to create an archive using the Singularity container:
260+
261+
```bash
262+
$ singularity run --app create rar.simg archive.rar noodles/
263+
```
264+
265+
And now extract the same one, but somewhere else.
266+
267+
```bash
268+
$ singularity run --app extract rar.simg archive.rar another-noodles
269+
```
270+
271+
## Advanced Usage
272+
You can interact with unrar and rar in the containers directly if you use "exec".
273+
Here are the same commands, but we are calling the executables directly!
274+
275+
```bash
276+
# Create
277+
$ singularity exec rar.simg rar a new-archive.rar noodles/
278+
279+
# Extract
280+
$ singularity exec rar.simg unrar x new-archive.rar more-noodles/
281+
282+
UNRAR 5.30 beta 2 freeware Copyright (c) 1993-2015 Alexander Roshal
283+
284+
285+
Extracting from new-archive.rar
286+
287+
Creating more-noodles OK
288+
Creating more-noodles/noodles OK
289+
Extracting more-noodles/noodles/udon.txt OK
290+
Extracting more-noodles/noodles/ramen.txt OK
291+
Extracting more-noodles/noodles/lasagna.txt OK
292+
All OK
293+
```
294+
295+
Finally, if you want to be working inside the image, you can use "shell"
296+
297+
```bash
298+
$ singularity shell rar.simg
299+
Singularity: Invoking an interactive shell within container...
300+
301+
Singularity rar.simg:~/Documents/Dropbox/Code/shub/containers/rar> which rar
302+
/usr/bin/rar
303+
Singularity rar.simg:~/Documents/Dropbox/Code/shub/containers/rar> which unrar
304+
/usr/bin/unrar
305+
```
306+
307+
Did you enjoy this post? Please <a href="https://github.com/vsoch/lessons" target="_blank">star the repository on Github</a> to show support! If you have a question for the dinosaur debugger, or would like to request a lesson, you can
308+
<a href="https://github.com/vsoch/lessons/issues" target="_blank"> do that here.</a>
309+
310+
Can you make this post better? <a href="{{ site.repo }}/issues" target="_blank">Submit an issue</a> or
311+
better yet, make the edit yourself with a pull request <a href="{{ site.repo }}/issues" target="_blank">to the repository!</a>.

pages/tags/archives.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
layout: tag
3+
tag: archives
4+
permalink: /archives
5+
---

pages/tags/python.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
layout: tag
3+
tag: python
4+
permalink: /python
5+
---

0 commit comments

Comments
 (0)