The Challenge:
Words carry semantic information. Similar to how people can infer meaning based on a word's context, AI can derive representations for words based on their context too! However, the kinds of meaning that a model uses may not match ours. We've found a pair of AIs speaking in metaphors that we can't make any sense of! The embedding model is glove-twitter-25. Note that the flag should be fully ASCII ans starts with 'htb{some_text}'.
Ever wondered how AI understands metaphors and analogies? This Hack The Box challenge threw me into a linguistic maze filled with strange word pairs and metaphorical riddles. The twist? It had to be solved using GloVe Twitter embeddings.
Each line follows the analogy format:
A is to B, as C is to ?
These were weird combinations, mixing English, Unicode characters, emojis, and foreign scripts. We’re told that the embedding model in use is:glove-twitter-25
Goal:
Infer the missing fourth term using word embeddings and extract the final flag which must be ASCII and start with htb{}.
Tools & Setup
- Model:
glove-twitter-25 - Library:
gensim - Input:
challenge.txt(a list of analogies) - Output:
flag.txt(the inferred flag characters)
import re from gensim.models import KeyedVectors def load_glove_model(): model_path = "glove.twitter.27B/glove.twitter.27B.25d.txt" model = KeyedVectors.load_word2vec_format(model_path, binary=False, no_header=True) return model def parse_challenge(file_path, model): with open(file_path, 'r') as file: lines = file.readlines() results = [] flag_characters = [] for i, line in enumerate(lines): match = re.search(r"Like (.+?) is to (.+?), (.+?) is to\?", line.strip()) if not match: match = re.search(r"Like (.+) is to (.+), (.+) is to\?", line.strip()) if not match: continue key, value, query = match.groups() key = key.strip() value = value.strip() query = query.strip() print(f"Extracted: '{key}' -> '{value}', '{query}' -> ?") try: missing_words = [] for word in [key, value, query]: if word not in model: missing_words.append(word) if missing_words: print(f"Skipping due to missing words: {missing_words}") continue # This performs the vector math result_vector = model[value] - model[key] + model[query] closest_word = model.most_similar(positive=[result_vector], topn=1)[0][0] print(f"Closest match for '{query}' is '{closest_word}'") flag_characters.append((i, query, closest_word)) except KeyError as e: print(f"Error: {e}") continue mapped_chars = [char[2] for char in flag_characters] potential_flag = ''.join(mapped_chars) print(f"Potential flag sequence: {potential_flag}") normalized_flag = potential_flag replacements = { '0': '0', '1': '1', '2': '2', '3': '3', '4': '4', '5': '5', '6': '6', '7': '7', '8': '8', '9': '9' } for non_ascii, ascii_char in replacements.items(): normalized_flag = normalized_flag.replace(non_ascii, ascii_char) print(f"Normalized flag: {normalized_flag}") return normalized_flag if __name__ == "__main__": challenge_file = "challenge.txt" model = load_glove_model() flag_sequence = parse_challenge(challenge_file, model) print("FINAL FLAG:") print(flag_sequence) # Create a clean output file with just the flag with open('flag.txt', 'w') as flag_file: flag_file.write(flag_sequence) print(f"Flag has been saved to flag.txt") Steps to Run:
1) Place the GloVe files in "glove.twitter.25B/".
2) Run "python main.py" to process "challenge.txt".
3) The resulting flag is written to "flag.txt".
This challenge was a fun mix of NLP, embeddings, and CTF logic. It’s not every day you have AIs “speaking in metaphors,” and it was fascinating to reverse-engineer that conversation!

Top comments (0)