Hi All,
So yesterday I decided to move my site from github.io to one of my personal domains. I am not sure why I decided this. I mean Github provides a great free service. And I had lots of automation in place. In this article I will just cover how i went about updating all pages with redirects from github to the new domain.
The Goal
My goal was to leave my old site up on GitHub for a few weeks to give Google some time to update its indexing. With all old pages being able to redirect to their new counterparts. This requires that I have redirects in place for each page. I will then remove the github site repo.
Limitations with Github
From my understanding. There is no support for server side redirects. This left me with the option of using HTML/JavaScript redirects.
Resources and Research
I did a fair amount of reading before tacking this mini project.
- Check file extention - Stackoverflow
- Shell Parameter Expansion
- JavaScript SEO Suprise
- Setup a redirect on Github Pages
- Javascript Redirect with Canonical
- JavaScript Page Redirect
- is HTML Canonical - stackoverflow
- How to Redirect a Web Page
- Bash: Recursively Travel a Directory of N Levels - stackoverflow
- Permanently Redirect GitHub Pages
Implementation
I chose to go with the HTML rewrite approach.
One of the reason for this is that I create my site using two Github repos. One is private with some workflow actions which populates the second public repo. The public repo only has the generated webpage code.
Initial Setup
Since I am focusing on the re-write script. I am going to assume you have done the following things.
- Have the new site up and running on the new domain.
- Disabled Google Analytics and AdSense on the old site, or already moved these properties to the new domain if you are using them.
Getting things into place.
In my case this was pretty straight forward. My site is being served from a repo that only contains the page code. So working in a seperate area from my normal local copy of my private repo I simply cloned the public site to my local machine.
git clone git@github.com:Bas-Man/bas-man.github.io.git
Within this directory. I then removed files and folders that would not be needed.
- css
- images
- fonts
Those sorts of things.
The script components
New HTML file contents
I decided to go with the following html
code with some adjustments.
<!DOCTYPE HTML>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="refresh" content="0;url={{THE_NEW_URL}}" />
<link rel="canonical" href="{{THE_NEW_URL}}" />
</head>
<body>
<h1>
The page been moved to <a href="{{THE_NEW_URL}}">{{THE_NEW_URL}}</a>
</h1>
</body>
</html>
Within this local repo I created a file called source
with the following contents
<!DOCTYPE HTML>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="refresh" content="0;url=https://bas-man.dev/NAME" />
<link rel="canonical" href="https://bas-man.dev/NAME" />
</head>
<body>
<h1>
The page been moved to <a href="https://bas-man.dev/NAME">https://bas-man.dev</a>
</h1>
</body>
</html>
The value NAME
is just a place holder and will be replaced in each file using the output of the good old trusty sed
command.
Transversing the Directory Structure.
For this I chose the following.
find . -print0 | while IFS= read -r -d '' file
do
echo "$file"
done
And did some testing which gave a huge amount of output.
--snip--
./categories/coding/page/4/index.html
./categories/coding/page/3
./categories/coding/page/3/index.html
./categories/coding/page/2
./categories/coding/page/2/index.html
./categories/coding/page/5
./categories/coding/page/5/index.html
./categories/coding/index.html
./categories/automation
./categories/automation/page
./categories/automation/page/1
./categories/automation/page/1/index.html
./categories/automation/index.html
./categories/front-end
./categories/front-end/page
Straight away I noticed there would be an issue with the ./
at the start of each line. But I will get to that.
What do I actually want to do
What I want to do is to replace any file ending in .html
with the contents of source
where name
has been replaced with the path
to the html file. Let’s take a look at a simple example from above
./categories/coding/index.html
needs to have its contents replaced with the contents of source
but with NAME
replaced with categories/coding/index.html
.
Example
<!DOCTYPE HTML>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="refresh" content="0;url=https://bas-man.dev/categories/coding/index.html" />
<link rel="canonical" href="https://bas-man.dev/categories/coding/index.html" />
</head>
<body>
<h1>
The page been moved to <a href="https://bas-man.dev/categories/coding/index.html">https://bas-man.dev</a>
</h1>
</body>
</html>
The final Script and Break Down
find . -print0 | while IFS= read -r -d '' file
do
if [ "${file: -4}" == "html" ] ; then
cat ./source | sed "s|NAME|${file:2}|g" > ${file}
fi
done
The if statement
- This check if the file ends in
html
. To do this we use${file: -4}
which return the last 4 characters offile
. If this is true we will replace the contents offile
.
Re-write the file contents
- Replacing the contents is done using old classic tools.
cat
,sed
and>
the output redirect.- I pass the contents of
source
to sed. - Sed then replaces
NAME
with the value offile
starting from the 3rd character. We count from 0. So 0,1,2 <- 3 position. - I use double quotes since I am doing interpolation
- I use
|
since of/
as my separator since my string also contains/
characters. - I use the
g
option since I need to place all occurances ofNAME
, so using greedy mode.
- I pass the contents of
With this done. I take the easy way and run the script by a shell. The script was simply called update
sh update
And all the HTML files are updated with the correct references to the exact same pages, but to the new domain.
Update the old site
From here it was a simple matter to update the repo with some git commands and push the updates back up to GitHub
git add .
git reset -- update
git reset -- source
git commit -m "my comment"
git push -u origin
The resets were just to remove the two files I didn’t need pushed to the live site.
Closing
That’s about it. Seems to have worked pretty well. I just need to keep an eyes on things and see if I need to make any changes.
Hope this might help others.