Saturday, 12 August 2017

Squashing range of old commits in git

        Changing the commit history of your branch is usually not the best idea. It should be done only in very specific cases. First of all by changing the history you will cause a great headache for whoever has changes in a branch branched off of the branch you are changing. My case was very specific. I had to squash old commits in the develop branch, which was used earlier by pretty much just a single developer. The effort was to create a demo application, which prompted the thought to just hide all these commits behind one "demo preparation" commit. Here are the steps I've taken to perform the squash on selected commits of the branch. I had a freshly installed git on a windows machine at my disposal.

    1.      I've branched off of develop and created develop-squash, a branch I could work on
     
    2.      I've cloned the same repository in a separate location - I like to have an option to manually check different commits in the other repository, while I'm working on a rebase in my initial repo
    3.      I've set the default git text editor to Notepad++:
     
    git config --global core.editor "'C:/Program Files (x86)/Notepad++/notepad++.exe' -multiInst -notabbar -nosession -noPlugin"

    4.      I've launched interactive rebase to change the whole history of the branch:

    git rebase -i --root

    What it will do is it will open the interactive rebase file in Notepad++. We will get a list of all commits in the branch. --root makes it list all commits from the very first one. We can set a specific hash instead of root, if we want to change the commit history from specific commit and not the first one. The list will not contain merge commits from other branches.

    Each commit in the list is set to "pick" which means the commit will be picked in the rebase operation. If we want to squash the given commit we need to change to "squash" or "s" for shortcut. (we can also drop specific commits or reorder them as we see fit).

    5.      After we've finished preparing the rebase, we can close the notepad document. Git will see it and execute the rebase, which will start going one by one. Each time there's an issue git will stop the rebase and allow us to act.
     
    6.      Whenever the rebase is stopped we get a chance at seeing what exactly went wrong and solve it. What usually happens is git tries to flatten the history behind the sub-branches of the branch we're squashing. In this case it will require you to resolve all conflicts. After you've resolved the conflicts with one of many diff tools, you can continue the rebase.

    7.      Continue the rebase by executing:

    git rebase --continue

    Or abort it:

    git rebase --abort

    8.      In my case quite often, to get sure I'm merging correctly I was checking a later point in the commit history, in the second repository I've cloned earlier.

    9.      After the rebase finished I've switched the names of the branches, created develop-legacy out of the initial develop, and  develop branch out of the develop-squash branch like this:

    git branch -m develop develop-legacy
    git branch -m develop-squash develop

    10.   Sort out master branch and HEAD:

    I've deleted the remote develop branch and pushed the new one in. Then I've overwritten the remote master branch with my develop as well:

    git push -f origin develop:master

    The above will also set the HEAD index onto this new master, which will leave our branch structure how it should be.

    11.   Cleanup

    At this moment we still heave our develop-legacy branch, which is the old branch with all our old commits. If we remove it we can pretty much remove all the associated tags and sub branches. The reason being - we won't need this history anymore, since we've already rewritten it.

    If we'll only delete the develop-legacy branch - our old commits will still be hanging, being kept by either old tags or sub branches. Once there are no branches and tags associated with it, git garbage collector should remove the orphaned commits (we can also force run it as well).


All in all, there are most probably better ways of doing it. This was something I've cobbled together. I'd be happy to learn about other options which I'm sure git has in stock.

Wednesday, 2 August 2017

Adding tiled watermark text to documents

        Recently I really wanted to start adding watermark texts to different documents with my personal information in them. These are usually related to the hiring process. There are different documents, which we frequently send out whenever we are getting x-rayed by this or that company, which is trying to hire us.

A watermark can be helpful in ensuring that the documents we send, if they ever were to appear somewhere, somehow, are going to be easily traceable back to the source. Obviously it is possible to remove a watermark especially with a shabby quality photo-copy. But at least it adds this 1 extra step which perhaps some people will not be willing to go through.

A watermark text itself could contain the information to whom this document is being sent to, and who has created it.

I wanted to be able to scan any document I want on my home scanner, have the document in a JPG format and add a sliding, tiled watermark text of my choosing. I also wanted to be able to script this operation for multiple files.

Obstacles:
- all printers I had access to printed in PDF
- there's no free, simple way of converting PDF to JPG without sending your data to some sketchy website
- adding watermark automatically to multiple documents (otherwise the easiest way would be to use a free tool such as GIMP to add such watermark text manually)

I've done a bit of digging and the simplest solution for me was as follows and requires installation of 2 programs:

- ImageMagicks (a console based image processing app)
- GhostScript (a pdf renderer)

ImageMagicks has it's own pdf to jpg conversion command but it relies on having GhostScript installed underneath - you can use GhostScript solely for that for better performance for batch jobs. I've used the ImageMagicks version for ease of use.

Once all is installed you can use the following command to convert pdf to jpg, given there's a Watermarking directory in your ImageMagics folder:

convert 
-density 150 
-trim Watermarking\test.pdf 
-quality 100 
-flatten 
-sharpen 0x1.0 Watermarking\test.jpg

You can customize the quality using different parameter values, for more info check: https://www.imagemagick.org/script/convert.php

Next we can add a watermark to the resulting image:

convert 
-font Arial 
-pointsize 40 
-size 430x270 xc:none 
-fill #80808080 
-gravity NorthWest 
-draw "rotate 15 text 5,0 'Mr XYZ Company ABC'" miff:- | composite 
-tile - Watermarking\test.jpg Watermarking\watermarked_test.jpg

You can read more oh the parameters used above under the following url: http://www.imagemagick.org/Usage/annotating/#wmark_text

Example pre-watermark:




Example post-watermark:


With the parameters in the command line you can among others set the density of the text it's size, size of the canvas the text is first created in, pivot the text and set the color of the font.

Once you have these commands it's really easy to parameterise them and use powershell, batch or other scripting tool to run it for all pdf's/jpg's in the given folder.

Troubleshooting:
- When on Windows you might want to pass the full path to the Convert.exe, otherwise Windows may mistake the call for another convert.exe which is a system executable related to the filesystem.