**Note: The content in this article is only for educational purposes and understanding of cybersecurity concepts. It should enable people and organizations to have a better grip on threats and know how to protect themselves against them. Please use this information responsibly.**
Web ArchiveSearch Engine web cache
Before discussing Web Archive, let us first understand that search engines like Google and Bing periodically crawl publicly accessible web pages to index them. This process of indexing provides the search engine with information that it needs to match search queries to websites and creates a cached copy of the website.
If a cached copy is available, then you can view this copy instead of visiting the website directly. This can be useful during Red Team or offensive exercises when the goal might be to stay off the radar of the target organization and out of the access logs. The cached copy might also be useful if the website is temporarily down or if it has changed recently and you want to access the older version.
How to access search engines caches?
We can access Google’s cache for a particular website by using the cache: operator with the domain that we want to see.
Lets see a demonstration.
Target site: https://www.infosectrain.com
Using the cache: operator write the following command into google search box.
cache:https://www.infosectrain.com
Press Enter
The last cache that was taken for the website is shown below.
Web Archive
To go back further than just the previous cache, you need to use a web archive.
The Wayback Machine is a tool commonly used for this purpose and has archived over 338 billion web pages from as long ago as 1996.
To use The Wayback Machine, just navigate to web.archive.org and enter the URL to be searched. The dates of any archived pages will be returned, and you can select a date to view a snapshot of the website at that moment in time.
Target site : https://www.infosectrain.com
Enter the target site into the searched field in the web archive official site and hit enter.
We can see that this site has been saved 175 times. Now let us select 5th March, 2021 to see that snapshot at this time.
Web archive can be used to identify pages (or functionality) of a site that used to be publicly available but have now been hidden. It can also be used to find information from the past about an organization during the open-source intelligence gathering stage of a pen test or ethical hack.