HackTheBox and Walkthough Guide Methodology
I suspect a lot of the walkthrough guides submitted to HackTheBox (HTB) are written by people who don’t fully understand the methodology behind what they’re doing for certain sections.
I say that in as non-judgemental a way as possible; I do HTB purely for fun, do not consider myself an IT security professional (though it overlaps with my position as a systems and network engineer), and am yet to truly complete a box without getting hints or consulting a walkthrough! I really do appreciate having them. But I feel a lot of the guides breeze over the reason why something works or why it was attempted in the first place; they state that you need to do X to achieve Y, but don’t tell you much beyond that. I believe this is because they were given a strong nudge towards it by somebody who does understand, or told outright what to do. I see this time and time again in the walkthroughs.
‘Bank’ is a retired, easy rated, Linux machine on HTB that I feel suffers from this. I’m not going to do a full write-up on it; this box was released in mid-June 2017, so there are many full guides already. But there are two particular sections of this machine that I feel a lot of walkthrough authors breezed over:
- The reasoning as to why you need to change your local hosts file to access the “hidden” logon portal
- The technique for identifying the unencrypted transaction file
Setting The Local Hosts File
Here are the reasonings from the top rated walkthroughs on HTB as to why you need to add bank.htb to your local hosts file:
“The hostname had to be guessed…this follows the standard convention of HTB machines of the format <machinename>.htb”
“We need to set the hostname…standard convention of HTB machines”
“The hostname must be guessed on this machine (bank.htb) and then added to /etc/hosts”
You’ll notice a lot of plagiarism between walkthrough guides, and that last quote is from the official pinned walkthrough on HTB! None of the above explain why you need to change your hosts file. Yes, if you have experience with HTB you might try adding bank.htb to your hosts file since it’s worked in the past on other boxes, but that’s not understanding the process or reasoning.
The actual reason why this works is very simple: an Apache web server can serve multiple domains and sites on a single interface or IP address, with requests directed to the relevant virtual site based on the URL that is requested by your browser. When you enter http://10.10.10.29, there is no virtual site configured for this, so you are directed to a default site, in this case the Apache default webpage. When you browse to http://bank.htb you will be directed to a different virtual site, resulting in the HTB Bank login page. Because there is no public DNS to resolve bank.htb for us, we need to add the relevant entry into our hosts file, so that entering bank.htb into our browser correctly resolves to the host’s IP address, 10.10.10.29.
What makes a person think to try this in the first place though? My thinking on this box was the fact TCP on port 53 was open (UDP 53 is a more common DNS port, but TCP 53 is DNS nonetheless). This was a large clue that name resolution was important for this box (and interestingly, you could also point your DNS to the IP of the box and allow it to resolve all name lookups instead of modifying your hosts file). Unfortunately there is still some guesswork involved, in that you need to know that .htb is commonly used as the TLD on HTB, and to guess ‘bank’ was indeed the hostname of the box, as unrealistic as this is.
Finding The Unencrypted Transaction File
Later in this box, you gain access to a web directory (/balance-transfer) which contains hundreds of .acc files. These turn out to be banking transaction logs, containing username, email, and password fields, all encrypted.
The process here seemed to be uniform across walkthroughs; one of the files has a file size that is “different” and “smaller” to the others, and sure enough that file contains clear-text fields, as the encryption “failed”. The process for finding this file ranged from sorting files by size (purely on a hunch) to scrolling down through every single file until one popped out as different…
This left me a little dissatisfied. I’m not a big fan on stumbling across the correct file on a hunch, or laboriously checking every single file manually. Granted, files of a contrasting size is a relevant way to identify files of interest, but it’s just too convenient and not particularly realistic that all the other files are pretty much exactly the same size. And what if there were millions of files instead of a few hundred? Neither of these processes would be efficient.
A much more “realistic” process, or at least one that scales better and actually takes into account the information we can gather from the encrypted .acc files, is to run a script that analyses the content, and filters out any that are different.
The files have a status string “ENCRYPT SUCCESS”:
We can presume from this that a process is taking place to encrypt each file individually, and if that process does not work correctly, that header string will be missing or different in some way. I downloaded the files with wget and wrote the below Python script to test each file for this “SUCCESS” string, outputting any abnormalities:
from os import listdir acc_file_path = "/home/user01/bank/bank.htb/balance-transfer/" acc_files = listdir(acc_file_path) for file in acc_files: with open(acc_file_path + file, 'r') as f: content = (f.read()) if "SUCCESS" not in content: print("No SUCCESS string: ", acc_file_path + file)
This identified the abnormal file, which contained the clear text username and password.
As somebody who does HTB as a hobby, I do find myself consulting the walkthroughs after a few days if I’m stuck on a box. While great for letting you know the steps involved in cracking a box, I’m starting to see how they often lack an explanation as to the thinking and methodology behind particular steps. You can argue that not everything should be handed to you on a plate, but I believe that some kind of indication as to why a process was attempted is important. As always, to get the most from these boxes in terms of knowledge and learning, it’s best to dig deeper into the parts that you don’t understand.