Hi All,
So yesterday I decided to move my site from github.io to one of my personal domains. I am not sure why I decided this. I mean Github provides a great free service. And I had lots of automation in place. In this article I will just cover how i went about updating all pages with redirects from github to the new domain.
The Goal
My goal was to leave my old site up on GitHub for a few weeks to give Google some time to update its indexing. With all old pages being able to redirect to their new counterparts. This requires that I have redirects in place for each page. I will then remove the github site repo.
Limitations with Github
From my understanding. There is no support for server side redirects. This left me with the option of using HTML/JavaScript redirects.
Resources and Research
I did a fair amount of reading before tacking this mini project.
- Check file extention - Stackoverflow
- Shell Parameter Expansion
- JavaScript SEO Suprise
- Setup a redirect on Github Pages
- Javascript Redirect with Canonical
- JavaScript Page Redirect
- is HTML Canonical - stackoverflow
- How to Redirect a Web Page
- Bash: Recursively Travel a Directory of N Levels - stackoverflow
- Permanently Redirect GitHub Pages
Implementation
I chose to go with the HTML rewrite approach.
One of the reason for this is that I create my site using two Github repos. One is private with some workflow actions which populates the second public repo. The public repo only has the generated webpage code.
Initial Setup
Since I am focusing on the re-write script. I am going to assume you have done the following things.
- Have the new site up and running on the new domain.
- Disabled Google Analytics and AdSense on the old site, or already moved these properties to the new domain if you are using them.
Getting things into place.
In my case this was pretty straight forward. My site is being served from a repo that only contains the page code. So working in a seperate area from my normal local copy of my private repo I simply cloned the public site to my local machine.
git clone git@github.com:Bas-Man/bas-man.github.io.git
Within this directory. I then removed files and folders that would not be needed.
- css
- images
- fonts
Those sorts of things.
The script components
New HTML file contents
I decided to go with the following html code with some adjustments.
<!DOCTYPE HTML>                                                                 
<html lang="en">                                                                
    <head>                                                                      
        <meta charset="utf-8">
        <meta http-equiv="refresh" content="0;url={{THE_NEW_URL}}" />       
        <link rel="canonical" href="{{THE_NEW_URL}}" />                     
    </head>                                                                                                                                                                   
    <body>                                                                      
        <h1>                                                                    
            The page been moved to <a href="{{THE_NEW_URL}}">{{THE_NEW_URL}}</a>
        </h1>                                                                   
    </body>                                                                     
</html>
Within this local repo I created a file called source with the following contents
<!DOCTYPE HTML>                                                                 
<html lang="en">                                                                
    <head>                                                                      
        <meta charset="utf-8">
        <meta http-equiv="refresh" content="0;url=https://bas-man.dev/NAME" />       
        <link rel="canonical" href="https://bas-man.dev/NAME" />                     
    </head>                                                                                                                                                                   
    <body>                                                                      
        <h1>                                                                    
            The page been moved to <a href="https://bas-man.dev/NAME">https://bas-man.dev</a>
        </h1>                                                                   
    </body>                                                                     
</html>
The value NAME is just a place holder and will be replaced in each file using the output of the good old trusty sed command.
Transversing the Directory Structure.
For this I chose the following.
find . -print0 | while IFS= read -r -d '' file
do 
    echo "$file"
done
And did some testing which gave a huge amount of output.
--snip--
./categories/coding/page/4/index.html
./categories/coding/page/3
./categories/coding/page/3/index.html
./categories/coding/page/2
./categories/coding/page/2/index.html
./categories/coding/page/5
./categories/coding/page/5/index.html
./categories/coding/index.html
./categories/automation
./categories/automation/page
./categories/automation/page/1
./categories/automation/page/1/index.html
./categories/automation/index.html
./categories/front-end
./categories/front-end/page
Straight away I noticed there would be an issue with the ./ at the start of each line. But I will get to that.
What do I actually want to do
What I want to do is to replace any file ending in .html with the contents of source where name has been replaced with the path to the html file. Letβs take a look at a simple example from above
./categories/coding/index.html needs to have its contents replaced with the contents of source but with NAME replaced with categories/coding/index.html.
Example
<!DOCTYPE HTML>                                                                 
<html lang="en">                                                                
    <head>                                                                      
        <meta charset="utf-8">
        <meta http-equiv="refresh" content="0;url=https://bas-man.dev/categories/coding/index.html" />       
        <link rel="canonical" href="https://bas-man.dev/categories/coding/index.html" />                     
    </head>                                                                                                                                                                   
    <body>                                                                      
        <h1>                                                                    
            The page been moved to <a href="https://bas-man.dev/categories/coding/index.html">https://bas-man.dev</a>
        </h1>                                                                   
    </body>                                                                     
</html>
The final Script and Break Down
find . -print0 | while IFS= read -r -d '' file
do 
    if [ "${file: -4}" == "html" ] ; then
       cat ./source | sed "s|NAME|${file:2}|g" > ${file}
    fi
done
The if statement
- This check if the file ends in html. To do this we use${file: -4}which return the last 4 characters offile. If this is true we will replace the contents offile.
Re-write the file contents
- Replacing the contents is done using old classic tools. cat,sedand>the output redirect.- I pass the contents of sourceto sed.
- Sed then replaces NAMEwith the value offilestarting from the 3rd character. We count from 0. So 0,1,2 <- 3 position.
- I use double quotes since I am doing interpolation
- I use |since of/as my separator since my string also contains/characters.
- I use the goption since I need to place all occurances ofNAME, so using greedy mode.
 
- I pass the contents of 
With this done. I take the easy way and run the script by a shell. The script was simply called update
sh update
And all the HTML files are updated with the correct references to the exact same pages, but to the new domain.
Update the old site
From here it was a simple matter to update the repo with some git commands and push the updates back up to GitHub
git add .
git reset -- update
git reset -- source
git commit -m "my comment"
git push -u origin
The resets were just to remove the two files I didnβt need pushed to the live site.
Closing
Thatβs about it. Seems to have worked pretty well. I just need to keep an eyes on things and see if I need to make any changes.
Hope this might help others.