Posted on Apr 25

Like a Glove HTB: Decoding Metaphors with GloVe Embeddings

#hackthebox #ai #machinelearning #datascience

The Challenge:

Words carry semantic information. Similar to how people can infer meaning based on a word's context, AI can derive representations for words based on their context too! However, the kinds of meaning that a model uses may not match ours. We've found a pair of AIs speaking in metaphors that we can't make any sense of! The embedding model is glove-twitter-25. Note that the flag should be fully ASCII ans starts with 'htb{some_text}'.

Ever wondered how AI understands metaphors and analogies? This Hack The Box challenge threw me into a linguistic maze filled with strange word pairs and metaphorical riddles. The twist? It had to be solved using GloVe Twitter embeddings.

Each line follows the analogy format:

A is to B, as C is to ?

These were weird combinations, mixing English, Unicode characters, emojis, and foreign scripts. We’re told that the embedding model in use is:glove-twitter-25

Goal:

Infer the missing fourth term using word embeddings and extract the final flag which must be ASCII and start with htb{}.

Tools & Setup

Model: glove-twitter-25
Library: gensim
Input: challenge.txt (a list of analogies)
Output: flag.txt (the inferred flag characters)

import re from gensim.models import KeyedVectors def load_glove_model(): model_path = "glove.twitter.27B/glove.twitter.27B.25d.txt" model = KeyedVectors.load_word2vec_format(model_path, binary=False, no_header=True) return model def parse_challenge(file_path, model): with open(file_path, 'r') as file: lines = file.readlines() results = [] flag_characters = [] for i, line in enumerate(lines): match = re.search(r"Like (.+?) is to (.+?), (.+?) is to\?", line.strip()) if not match: match = re.search(r"Like (.+) is to (.+), (.+) is to\?", line.strip()) if not match: continue key, value, query = match.groups() key = key.strip() value = value.strip() query = query.strip() print(f"Extracted: '{key}' -> '{value}', '{query}' -> ?") try: missing_words = [] for word in [key, value, query]: if word not in model: missing_words.append(word) if missing_words: print(f"Skipping due to missing words: {missing_words}") continue # This performs the vector math result_vector = model[value] - model[key] + model[query] closest_word = model.most_similar(positive=[result_vector], topn=1)[0][0] print(f"Closest match for '{query}' is '{closest_word}'") flag_characters.append((i, query, closest_word)) except KeyError as e: print(f"Error: {e}") continue mapped_chars = [char[2] for char in flag_characters] potential_flag = ''.join(mapped_chars) print(f"Potential flag sequence: {potential_flag}") normalized_flag = potential_flag replacements = { '０': '0', '１': '1', '２': '2', '３': '3', '４': '4', '５': '5', '６': '6', '７': '7', '８': '8', '９': '9' } for non_ascii, ascii_char in replacements.items(): normalized_flag = normalized_flag.replace(non_ascii, ascii_char) print(f"Normalized flag: {normalized_flag}") return normalized_flag if __name__ == "__main__": challenge_file = "challenge.txt" model = load_glove_model() flag_sequence = parse_challenge(challenge_file, model) print("FINAL FLAG:") print(flag_sequence) # Create a clean output file with just the flag with open('flag.txt', 'w') as flag_file: flag_file.write(flag_sequence) print(f"Flag has been saved to flag.txt")

Steps to Run:

1) Place the GloVe files in "glove.twitter.25B/".
2) Run "python main.py" to process "challenge.txt".
3) The resulting flag is written to "flag.txt".

This challenge was a fun mix of NLP, embeddings, and CTF logic. It’s not every day you have AIs “speaking in metaphors,” and it was fascinating to reverse-engineer that conversation!