Skip to content

Commit 8b99355

Browse files
committed
initial commit 1 - readme/license
0 parents commit 8b99355

File tree

3 files changed

+103
-0
lines changed

3 files changed

+103
-0
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
/target
2+
/data

LICENSE.txt

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
The MIT License (MIT)
2+
3+
Copyright (c) 2024 Alex "mcmonkey" Goodwin
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
# mcmonkey's AI Translation Tool
2+
3+
Bulk translates everything in a reference file using local translation AI.
4+
5+
Built using https://github.com/huggingface/candle and by default uses this model: https://huggingface.co/jbochi/madlad400-7b-mt-bt in GGUF-q4 which is derived from https://huggingface.co/google/madlad400-7b-mt-bt
6+
7+
## Usage
8+
9+
First compile:
10+
```sh
11+
cargo build --release
12+
```
13+
14+
Then run:
15+
16+
```sh
17+
./target/release/translate-tool.exe --in-json "data/test-in.json" --out-json "data/test-out.json" --language de
18+
```
19+
20+
Tack `--verbose` onto the end to get some live debug output as it goes.
21+
22+
Use `--model-id jbochi/madlad400-3b-mt` if you're impatient and want a smaller model.
23+
24+
On an Intel i7-12700KF, 7b-mt-bt runs at around 1 token/s, 3b-mt runs at around 2.8 token/s.
25+
26+
Example input JSON file:
27+
```json
28+
{
29+
"keys": {
30+
"This keys needs translation": "",
31+
"This key doesn't": "cause it has a value"
32+
}
33+
}
34+
```
35+
36+
This will translate keys and store the result in the value, skipping any keys that already have a value.
37+
38+
Language should be a standard language code - if in doubt, see list at https://arxiv.org/pdf/2309.04662.pdf Appendix A.1
39+
40+
Note that this runs entirely on CPU, because the Transformers GPU version needs too much VRAM to work and GGUF doesn't want to work on GPU within candle I guess? "Oh but why not use regular GGML to run it then" because GGML doesn't support T5??? Idk why candle supports GGML-formatted T5 but GGML itself doesn't. AI tech is a mess. If you're reading this after year 2024 when this was made there's hopefully less dumb ways to do what is currently cutting edge AI stuff.
41+
42+
This will burn your CPU and take forever.
43+
44+
Note that I'm not experienced in Rust and the lifetime syntax is painful so I might've screwed something up.
45+
46+
## Legal Stuff
47+
48+
This project depends on Candle which is either MIT or Apache2. Both licenses are in their repo don't ask me what that means idek.
49+
50+
Sections of source code are copied from Candle examples.
51+
52+
This project depends on MADLAD models that google research released under Apache2 which I'm not entirely clear why a software license is on model weights but again idek.
53+
54+
Anything unique to this project is yeeted out freely under the MIT license.
55+
56+
I have no idea whether any legal restrictions apply to the resultant translated text but you're probably fine probably (if you have rights to use the source text at least)
57+
58+
## License
59+
60+
The MIT License (MIT)
61+
62+
Copyright (c) 2024 Alex "mcmonkey" Goodwin
63+
64+
Permission is hereby granted, free of charge, to any person obtaining a copy
65+
of this software and associated documentation files (the "Software"), to deal
66+
in the Software without restriction, including without limitation the rights
67+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
68+
copies of the Software, and to permit persons to whom the Software is
69+
furnished to do so, subject to the following conditions:
70+
71+
The above copyright notice and this permission notice shall be included in all
72+
copies or substantial portions of the Software.
73+
74+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
75+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
76+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
77+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
78+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
79+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
80+
SOFTWARE.

0 commit comments

Comments
 (0)