Posted on Nov 30, 2021

Solving Interview Problems with Deep Learning

#tensorflow #machinelearning #programminginterviews #technicalinterview

![Image description](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/qksna9yv4dtg1hw0dzz9.jpg)_Photo credit: Godzilla vs. Kong_

It’s been a while since I’ve practiced programming interview questions, and I worry that my skills are lacking. It’s always important to work through some every now and then to stay sharp so here we go:

Let’s start with Fizz Buzz:

Write a program that prints the numbers from 1 to 100. But for multiples of 3print “Fizz” instead of the number and for the multiples of 5 print “Buzz”. For numbers which are multiples of both 3 and 5 print “FizzBuzz”.

This solution was actually inspired by Joel Grus. We can represent numbers with binary encoding:

import tensorflow as tf import numpy as np INPUT_SIZE = 10 # We can encode up to 2^15 numbers with binary encoding. HIDDEN_SIZE = 100 # Hidden layer of size 100 OUTPUT_SIZE = 4 # One hot encoding for the 4 possible outputs: "Fizz" "Buzz" "FizzBuzz", "" NUM_EPOCHS = 10000 def binary_encode(number): data = np.zeros(INPUT_SIZE) for bitshift in range(INPUT_SIZE): data[bitshift] = number >> bitshift & 1 # Get the bit at position bitshift return data def fizz_buzz_encode(i): if i % 15 == 0: return np.array([0, 0, 0, 1]) elif i % 5 == 0: return np.array([0, 0, 1, 0]) elif i % 3 == 0: return np.array([0, 1, 0, 0]) else: return np.array([1, 0, 0, 0]) def fizz_buzz_decode(encoding): max_idx = np.argmax(encoding) if max_idx == 0: return "" if max_idx == 1: return "fizz" if max_idx == 2: return "buzz" if max_idx == 3: return "fizzbuzz"

Now, we can generate some training data (we will generate training data from 101 to 1024) since our actual problem will solve fizzbuzz for 1–100.

X_train = np.array([binary_encode(i) for i in range(101, 2 ** INPUT_SIZE)]) y_train = np.array([fizz_buzz_encode(i) for i in range(101, 2 ** INPUT_SIZE)])

And now, let’s define our multilayer perceptron model that will actually perform the bulk of the work. First, we define the inputs to the model:

X = tf.placeholder('float', [None, INPUT_SIZE]) Y = tf.placeholder('float', [None, OUTPUT_SIZE])

Next, let’s define the weights and biases that we will learn via backpropagation:

w1 = tf.get_variable("w1", [INPUT_SIZE, HIDDEN_SIZE], initializer=tf.random_normal_initializer()) b1 = tf.get_variable("b1", [HIDDEN_SIZE]) w2 = tf.get_variable("w2", [HIDDEN_SIZE, OUTPUT_SIZE], initializer=tf.random_normal_initializer()) b2 = tf.get_variable("b2", [OUTPUT_SIZE])

And now, let’s feed our input through the model:

z1 = tf.add(tf.matmul(X, w1), b1) a1 = tf.nn.relu(z1) z2 = tf.add(tf.matmul(a1, w2), b2)

Then, let’s compute the loss and tell tensorflow to minimize that loss:

cost = tf.nn.softmax_cross_entropy_with_logits(logits=z2, labels=Y) optimizer = tf.train.GradientDescentOptimizer(0.001).minimize(cost)

Now, let’s actually feed our real training data into that model:

init = tf.global_variables_initializer() with tf.Session() as sess: sess.run(init) for i in range(NUM_EPOCHS): c, o = sess.run([cost, optimizer], feed_dict={X: X_train, Y: y_train}) print(np.sum(c)) X_test = np.array([binary_encode(i) for i in range(1, 100)]) y_test = np.array([fizz_buzz_encode(i) for i in range(1, 100)]) pred = sess.run(z2, feed_dict={X:X_test, Y:y_test})

And print out the results:

def real_fizz_buzz(i): txt = "" if i % 3 == 0: txt += "fizz" if i % 5 == 0: txt += "buzz" return txt for i in range(len(X_test)): text = fizz_buzz_decode(pred[i]) true_text = real_fizz_buzz(i + 1) print("{}: {} ({})".format(i + 1, text, "✔" if text == true_text else "x"))

Results:

1: (✔) 2: (✔) 3: (x) ... 97: (✔) 98: (✔) 99: fizz (✔) Total Accuracy: 89.89%

For all that work, we achieved an embarrassingly low 90% accuracy. I was going to try to fix this but then quickly lost interest. Let’s move onto something a bit more interesting:

Write a program that counts the number of unique characters in a string.

This sounds like a problem that can be solved with an LSTM. Let’s generate some data.

import tensorflow as tf import random import numpy as np CHARS = ['a', 'b', 'c', 'd', 'e', 'f', 'g'] STRING_LENGTH = 12 num_examples = 10000 # Args: # n: Number of examples to generate. # Returns: # strings_v: numpy array of the form (n, STRING_LENGTH, len(CHARS)). One hot encoding of sequences of text # strings: Array of actual generated random text: # uniques_v: numpy array of the form (n, len(CHARS)). One hot encoding of number of unique characters # uniques: numpy array of length n, number of unique characters for each sequence. def generate_data(n=num_examples): chars_to_idx = { c: i for i, c in enumerate(CHARS)} strings_v = np.zeros([n, STRING_LENGTH, len(CHARS)]) strings = [''] * n uniques = np.zeros(n) uniques_v = np.zeros([n, len(CHARS)]) for x in range(n): for y in range(STRING_LENGTH): random.shuffle(CHARS) char = CHARS[0] strings_v[x][y][chars_to_idx[char]] = 1 strings[x] += char uniques[x] = len(set(strings[x])) uniques_v[x][len(set(strings[x])) - 1] = 1 return strings_v, strings, uniques_v, uniques

Next let’s create our LSTM model:

HIDDEN_LAYERS = 64 X = tf.placeholder("float", [None, STRING_LENGTH, len(CHARS)]) y = tf.placeholder("float", [None, len(CHARS)]) X_seq = tf.unstack(X, STRING_LENGTH, 1) lstm_cell = tf.contrib.rnn.BasicLSTMCell(HIDDEN_LAYERS) #sequence of 12 chars to output of 7 outputs, states = tf.contrib.rnn.static_rnn(lstm_cell, X_seq, dtype=tf.float32) final_output = outputs[-1] weights = tf.get_variable("weights", [HIDDEN_LAYERS, len(CHARS)], initializer=tf.random_normal_initializer()) biases = tf.get_variable("biases", [len(CHARS)], initializer=tf.random_normal_initializer()) prediction = tf.add(tf.matmul(final_output, weights), biases) cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels=y)) optimizer = tf.train.AdamOptimizer(1e-2) train_op = optimizer.minimize(cost)

Now, let’s go ahead and run our model:

init = tf.global_variables_initializer() costs = [] EPOCHS = 300 with tf.Session() as sess: sess.run(init) for i in range(EPOCHS): _, c, _ = sess.run([train_op, cost, prediction], feed_dict={X:X_train, y: y_train}) costs.append(c) if i % 10 == 0: print('cost: {}, epoch: {}'.format(c, i)) X_test, X_test_strings, y_test, y_test_strings = generate_data(100) p = sess.run(prediction, feed_dict={X: X_test, y: y_test}) prediction_idxs = np.argmax(p, axis=1) prediction_vals = prediction_idxs + 1 correct = 0.0 for i in range(len(y_test_strings)): string = X_test_strings[i] actual_val = y_test_strings[i] predicted_val = prediction_vals[i] # Print the first 5 examples if i < 5: print('string: {}, pred: {}, actual: {}'.format(string, predicted_val, actual_val)) if predicted_val == actual_val: correct += 1 print("{}% accuracy\n\n".format(correct * 100 / len(y_test_strings)))

cost: 0.0496655665338, epoch: 280 string: eeaccbdeagac, pred: 6, actual: 6.0 string: gefaddbbcfac, pred: 7, actual: 7.0 string: acedcbgdcagf, pred: 7, actual: 7.0 string: aadebcdacefg, pred: 7, actual: 7.0 string: abeeaebcbbag, pred: 5, actual: 5.0 100% accuracycost: 0.0413268692791, epoch: 290 string: ebbfbfaebede, pred: 5, actual: 5.0 string: fgaccdbabedg, pred: 7, actual: 7.0 string: dgdcbaefcdad, pred: 7, actual: 7.0 string: ffagdfedccad, pred: 6, actual: 6.0 string: gfggfgbebcdb, pred: 6, actual: 6.0 99% accuracy

Hopefully the interviewer won’t be disappointed that we cannot solve this problem for any string that’s greater than MAX_LENGTH, but overall, it achieved a 99% accuracy. I didn’t deal with variable length inputs in this case, we could’ve easily done that by adding padding.

Now, instead of just printing out the number of unique characters, print out the actual unique characters in the order they appear.

Ex: “acabdb” => “acbd”

Oof. Thats a much tougher problem. In this case, the value we need to return is a sequence of characters instead of a single character or number so we will need some form of sequence to sequence model in order to learn this relationship.

import numpy as np import tensorflow as tf from tensorflow.contrib import rnn import random #"abc" => "abc" #"aabbac" => "abc" #"abacd" => "abcd" MAX_LENGTH = 6 # Max length of 6 chars = ["a", "b", "c", "d", "e", "f"] all_chars = chars + [' '] # Space for padding NUM_EXAMPLES = 50000 # Args: # n: number of examples to generate # Returns: # strings: list of strings that may contain duplicates # solutions: strings without duplicates # strings_v: One hot encoding of strings with duplicates (without padding) # solutions_v: One hot encoding of solutions (with padding) def generate_data(n=NUM_EXAMPLES): all_chars_to_idx = { c:i for i, c in enumerate(all_chars) } strings_v = np.zeros((NUM_EXAMPLES, MAX_LENGTH, len(all_chars))) solutions_v = np.zeros((NUM_EXAMPLES, MAX_LENGTH, len(all_chars))) strings = [''] * NUM_EXAMPLES solutions = [''] * NUM_EXAMPLES for i in range(NUM_EXAMPLES): for l in range(MAX_LENGTH): char = random.choice(chars) # only sample from valid characters strings[i] += char if char not in solutions[i]: solutions[i] += char # Pad solutions strings num_missing = MAX_LENGTH - len(solutions[i]) solutions[i] += ' ' * num_missing for x in range(len(strings)): for y in range(MAX_LENGTH): string_char = strings[x][y] strings_v[x][y][all_chars_to_idx[string_char]] = 1 solution_char = solutions[x][y] solutions_v[x][y][all_chars_to_idx[solution_char]] = 1 return strings, solutions, strings_v, solutions_v

Again, we can one hot encode the sequences, but this time, we will add some padding since the length of the output string is variable. Instead of trying to return a variable length sequence, we will just return a sequence that is equal to the length of the input, and pad the output with spaces.

For example, “abdab” would map to a string of the same length, but with spaces as padding: “abd “.

Let’s create a test and training split.

strings, solutions, strings_v, solutions_v = generate_data() split_at = len(strings) - (len(strings) // 10) strings_train = strings[:split_at] solutions_train = solutions[:split_at] X_train = strings_v[:split_at] y_train = solutions_v[:split_at] strings_test = strings[split_at:] solutions_test = solutions[split_at:] X_test = strings_v[split_at:] y_test = solutions_v[split_at:]

Now we can set up our model. Let’s start with the encoder:

encoded_input = tf.placeholder(tf.float32, shape=(None, MAX_LENGTH, len(all_chars))) decoded_input = tf.placeholder(tf.float32, shape=(None, MAX_LENGTH, len(all_chars))) with tf.name_scope("basic_rnn_seq2seq") as scope: encoded_sequence = tf.unstack(encoded_input, MAX_LENGTH, 1) encoder_cell = rnn.BasicLSTMCell(128, forget_bias=1.0) encoded_outputs, states = rnn.static_rnn(encoder_cell, encoded_sequence, dtype=tf.float32)

And now, the decoder:

with tf.name_scope("lstm_decoder") as scope: decoded_sequence = tf.unstack(decoded_input, MAX_LENGTH, 1) decoder_cell = rnn.BasicLSTMCell(128, reuse=True) decoded_outputs, _ = rnn.static_rnn(decoder_cell, decoded_sequence, initial_state=states, dtype=tf.float32)

Now, we can compute the predictions by multiplying the decoder’s hidden layer output with a fully connected layer:

with tf.name_scope("fully_connected") as scope: weights = tf.get_variable('weights', (128, len(all_chars)), initializer=tf.random_normal_initializer()) biases = tf.get_variable('biases', (len(all_chars)), initializer=tf.random_normal_initializer()) predictions = [] encoded_sequence = tf.unstack(decoded_input, MAX_LENGTH, 1) for output in decoded_outputs: prediction = tf.add(tf.matmul(output, weights), biases) predictions.append(prediction) concatenated_outputs = tf.stack(predictions, 0) concatenated_outputs = tf.transpose(concatenated_outputs, perm=[1, 0, 2]) concatenated_inputs = tf.concat(decoded_input, 0)

Now, we can compute the loss:

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=concatenated_outputs, labels=concatenated_inputs)) # FC Layer optimizer = tf.train.AdamOptimizer(1e-2) train_op = optimizer.minimize(cost)

And now let’s run our model:

def decode_guess(one_hot): return ''.join([all_chars[m] for m in np.argmax(one_hot, axis=1)]) init = tf.global_variables_initializer() costs = [] with tf.Session() as sess: sess.run(init) for i in range(30): e, r, c, t, c_out, c_in = sess.run([encoded_outputs, predictions, cost, train_op, concatenated_outputs, concatenated_inputs], feed_dict={encoded_input: X_train, decoded_input: y_train}) costs.append(c) if i % 5 == 0: print('training cost: {}, epoch: {}'.format(c, i)) results = sess.run(predictions, feed_dict={encoded_input: X_test, decoded_input: y_test}) guesses = np.array(results).transpose(1, 0, 2) for i in range(5): string = strings_test[i] solution = solutions_test[i] guess_decoded = decode_guess(guesses[i]) print("{}: {} - {}".format(string, solution, guess_decoded)) correct = 0.0 for i in range(len(strings_test)): string = strings_test[i] solution = solutions_test[i] guess_decoded = decode_guess(guesses[i]) if i < 5: print("input: {}, solution: {}, prediction: {}".format(string, solution, guess_decoded)) if solution == guess_decoded: correct += 1 print("{} % accuracy".format(correct / len(strings_test) * 100))

And here are the results:

training cost: 2.07600975037, epoch: 0 cebbdf: cebdf - dffcbc: dfcb - aadeab: adeb - faceec: face - bfaeec: bfaec - 0.0 % accuracy...training cost: 0.219934388995, epoch: 25 cebbdf: cebdf - cebdf dffcbc: dfcb - dfcb aadeab: adeb - adeb faceec: face - face bfaeec: bfaec - bfaec 100.0 % accuracy

It looks like by 25 epochs, our model has a fairly good understanding of how to solve this problem.