ARCHIVE NOTICE

My website can still be found at industrialcuriosity.com, but I have not been posting on this blog as I've been primarily focused on therightstuff.medium.com - please head over there and take a look!

Sunday 25 October 2020

Keeping Track of your Technical Debt

The impact of technical debt

Over the years the concept of "technical debt" has become a phrase that can generate anxiety and a lack of trust, as well as setting up developers and their managers for failure. The metaphor might not be perfect (Ralf Westphal makes a strong case for treating it like an addiction), but I feel it's pretty apt if you think of the story of The Pied Piper of Hamelin - if you don't pay the piper promptly for keeping you alive, he'll come back to steal your future away.

Maybe I'm being a bit dramatic, but I value my time on this planet and over the course of two decades as a software developer in various sectors of the industry I have (along with countless others) paid with so much of my own precious lifetime (sleep hours, in particular) for others' (and occasionally my own) quick fixes, rushed decisions and hacky workarounds that it pains me to even think about it.

It's almost always worthwhile investing in doing things right the first time, but we rarely have the resources to invest and we are often unable to accurately predict what's right in the first place.

So let's at least find a way to mitigate the harm.

Why TODOs don't help

The traditional method of initiating technical debt is the TODO. You write a well-meaning (and hopefully descriptive) comment starting with TODO, and "Hey, presto!" you have a nice, easy way to find all those little things you meant to fix. Right? Except that's usually not the case. We are rarely able to make time to go looking for more things to do, and even when we can, with modern software practices it's unlikely that searching across all of our different repositories will be effective.

What generally ends up happening, then, is that we only come across TODOs by coincidence, when we happen to be working with the code around it, and the chances are that it'll be written by a different developer from a different time period and be explained in... suboptimal language.

"Why hasn't this been done?", you may well ask.

"Leave that alone! We don't remember why it works," you may well be told.

Track your technical debt with this one simple trick!

No matter the size of your team (even a team of one), you should always be working with some kind of issue tracking software. There's decent, free software out there, so there's really no excuse.

The fix? All you need to do is log a ticket for each TODO. Every time you find yourself writing a TODO, and I mean every single time, create a ticket. In both the ticket and the TODO comment, explain what you're doing, what needs to be done, why it needs to be deferred, and how urgent completing the TODO task is.

Then reference the ticket in the TODO comment.

This way you'll have a ticket - which I like to think of as an IOU - which can be added to the backlog and remembered when grooming and planning. This also provides the developer who encounters the TODO in the code a way to review any details and subsequent conversations in the ticket.

One interesting side effect of this approach? I often find that it's more effort to create the TODO ticket than to do the right thing in the first place, which can be a great incentive for the juniors to avoid the practice.

Another?


After establishing with my team that I will not approve a TODO unless there's a ticket attached, I must admit... I do sleep just a little bit better at night.

Saturday 24 October 2020

Reading and writing regular expressions for sane people

Your regular expressions need love. Reviewers and future maintainers of your regular expressions need even more.

No matter how well you've mastered regex, regex is regex and is not designed with human-readability in mind. No matter how clear and obvious you think your regex is, in most cases it will be maintained by a developer who a) is not you and b) lacks context. Many years ago I developed a simple method for sanity checking regex with comments, and I'm constantly finding myself demonstrating its utility to new people.

There are some great guides out there, like this one, but what I'm proposing takes things a step or two further. It may take a minute or two of your time, but it almost invariably saves a lot more than it costs. I'm not even discussing flagrant abuse or performance considerations.


Traditional regex: the do-it-yourself pattern


The condescending regex. Here you're left to your own devices. Thoughts and prayers.

var parse_url = /^(?:([A-Za-z]+):)?(\/{0,3})(0-9.\-A-Za-z]+)(?::(\d+))?(?:\/([^?#]*))?(?:\?([^#]*))?(?:#(.*))?$/;

(example taken from this question)


Kind regex: intention explained


It's the least you can do! A short line explaining what you're matching with an example or two (or three).

// parse a url, only capture the host part
// eg. protocol://host
//     protocol://host:port/path?querystring#anchor
//     host
//     host/path
//     host:port/path
var parse_url = /^(?:([A-Za-z]+):)?(\/{0,3})(0-9.\-A-Za-z]+)(?::(\d+))?(?:\/([^?#]*))?(?:\?([^#]*))?(?:#(.*))?$/;


Careful regex: a human-readable breakdown


Here we ensure that each element of the regex pattern, no matter how simple, is explained in a way that makes it easy to verify that it's doing what we think it's doing and can modify it safely. If you're not an expert with regex, I recommend using one of the many available tools such as regexr.com.

// parse a url, only capture the host part
//     (?:([A-Za-z]+):)?
//         protocol - an optional alphabetic protocol followed by a colon
//     (\/{0,3})(0-9.\-A-Za-z]+)
//         host - 0-3 forward slashes followed by alphanumeric characters
//     (?::(\d+))?
//         port - an optional colon and a sequence of digits
//     (?:\/([^?#]*))?
//         path - an optional forward slash followed by any number of
//         characters not including ? or #
//     (?:\?([^#]*))?
//         query string - an optional ? followed by any number of
//         characters not including #
//     (?:#(.*))?
//         anchor - an optional # followed by any number of characters
var parse_url = /^(?:([A-Za-z]+):)?(\/{0,3})(0-9.\-A-Za-z]+)(?::(\d+))?(?:\/([^?#]*))?(?:\?([^#]*))?(?:#(.*))?$/;

Now that we've taken the time to break this down, we can identify the intention behind the patterns and ask better questions: why is the host the only matched group? Was this tested? (Because (0-9.\-A-Za-z] is clearly an error, and there are almost no restrictions on invalid characters)

Unless you're a sadist (or a masochist), this is definitely a better way to operate: be careful, and if you can't be careful then at least be kind.

Wednesday 19 August 2020

Priority pinning in apt-preferences with different versions and architectures

I'm posting this because I've lost too many hours figuring it out myself, the documentation is missing several important notes and I haven't found any forum posts that really relate to this:

Question: How do I prioritize specific package versions for multiple architectures? In particular, I have a number of different packages which I would like to download for multiple architectures, and I would like to prioritize the versions so that if they're not explicitly provided, apt will try to get a version matching my arbitrary requirements (in my case, the current git branch name) and fall back to our develop branch versions.

eg. I would like to download package my-package for both i386 and amd64 architectures and I would like to pull the latest version that includes my-git-branch-name before falling back to the latest that includes develop.

Answer:

The official documentation is here.

1. In order to support multiple architectures, all packages being pinned must have their architecture specified, and there must be an entry for each architecture. A pinning for the package name without the architecture specified will only influence the default (platform) architecture:

Package: my-package

Pin: version /your regex here/

Pin-Priority: 1001

2. The entries are whitespace-sensitive, although no errors will be reported if you have whitespace. The following pinning will be silently disregarded:

Package: my-package:amd64

    Pin: version /your regex here/

    Pin-Priority: 1001

3. apt update must be called after updating the preferences file in order for them to be respected and after adding additional architectures using (for example) dpkg --add-architecture i386

The following excerpt from /etc/apt/preferences solves the stated problem:


Package: my-package:amd64

Pin: version /-my-git-branch-name-/

Pin-Priority: 1001


Package: my-package:i386

Pin: version /-my-git-branch-name-/

Pin-Priority: 1001


Package: my-package:amd64

Pin: version /-develop-/

Pin-Priority: 900


Package: my-package:i386

Pin: version /-develop-/

Pin-Priority: 900 

It may be worthwhile noting that to download or install a package with a specified architecture and version use the command apt download package-name:arch=version

Sunday 16 August 2020

Enabling Programmable Bank Cards To Do All The Things

A little while back, Offerzen and Investec opened up their Programmable Bank Card Beta for South Africans, taking over from Root, and I was excited to be able to sign up!

My long-term goal with programmable banking is to integrate crypto transactions directly with my regular banking - which might be possible quite soon as Investec is starting to roll out its open API offering in addition to the card programme - and as far as I can tell that could be a real game-changer in terms of making crypto trading practical for non-investment purposes.
When I joined, though, what I found was a disparate group of really interesting projects and only a two-second time window to influence whether your card transaction will be approved or not.

I decided to bide my time by building a bridge for everyone else's solutions, one which would provide a central location to determine transaction approval and enable card holders (including myself, of course) to forward their transactions securely to as many services as they like without costing them running time on their card logic.

It was obvious to me that this needed to be a cloud solution, and as someone with zero serverless experience1 I spent some time evaluating AWS, Google and Microsoft's offerings. My priorities were simple:
  • right tool for the job
  • low cost
  • high performance
  • good user experience
In the end AWS was the clear winner with the first two priorities, so much so that I didn't even bother worrying about any potential performance tradeoffs (there probably aren't any) and I was prepared to put up with their generally mixed bag of user experience. I also had the added incentive that my current employer uses AWS in their stack so I would be benefiting my employer at the same time.

Overall, I'm very glad that I chose AWS in spite of the trials and tribulations, which you can follow in my earlier post about building a template for CDK for beginners (like myself). In addition to learning how to use CDK, this project has given me solid experience in adapting to cloud architecture and patterns that are substantially different from anything I've worked with before, and I've already been able to effectively apply my learnings to my contract work.

One of the obligations of the Programmable Bank Card beta programme is sharing your work and presenting it; I was happy to do so, but for months I've been locked down and working in the same space as my wife and four year-old (we don't have room for an office or much privacy in our apartment); with all the distractions of home schooling and having to work while my wife and kid play my favourite video games I've barely been able to keep up my hours with my paid gig, so making time for extra stuff? That hasn't been so easy.

A couple of weeks ago my presentation was due, and I spent a lot of the preceding two weekends in a mad scramble to get my project as production-ready as I could - secure, reliable, usable. Half an hour before "go time" I finally deployed my offering, and I was a little bit nervous doing a live demo... I mean, sure, I'd tested all the end points after each deployment - but as we all know: Things Happen. Usually During Live Demos.

Fortunately, the presentation went very well! I spent the following weekend adding any missing "critical" functionality (such as scheduling lambda warmups and implementing external card usage updates), and I'm hoping that some of my fellow community members get some good use out of it (whether on their own stacks or mine).

The code can be found here on GitLab, and a few days ago OfferZen published the recording of my presentation on their blog along with its transcript2 and my slide deck.

Thank you for joining me on my journey!



1 Although I was employed by one of the major cloud providers for a while, I've only worked "on" their cloud rather than "in", and while I do have extensive experience with Azure none of it included serverless functions.

2 The transcript has a few minor transcription errors, but they're more amusing than confusing.

Tuesday 28 July 2020

Resolving private PyPI configuration issues

The Story

I've spent the last few days wrestling with the serverless framework, growing fonder and fonder of AWS' CDK at every turn. I appreciate that serverless has been, until recently, very necessary and very useful, but I feel confident in suggesting that when deployment mechanisms become more complex than the code they're deploying things aren't moving in the right direction.

That said, this post is about overcoming what for us was the final hurdle: getting Docker able to connect to a private PyPI repository for our Python lambdas and lambda layers.

Python is great, kind of, but is marred by two rather severe complications: the split between python and python3, and pip… so it's a great language, with an awful tooling experience. Anyway, our Docker container couldn't communicate with our PyPI repo, it took way too long to figure out why, here's what we learned:

The Solution

If you want to use a private PyPI repository without typing in your credentials at every turn, there are two options:

1.


~/.pip/pip.conf:



2.


~/.pip/pip.conf:



~/.netrc:



It is important that ~/.netrc has the same owner:group as pip.conf, and that its permissions are 0600 (chmod 0600 ~/.netrc).

What's not obvious - or even discussed anywhere - is that special characters are problematic and are handled differently by the two mechanisms.

In pip.conf, the password MUST be URL encoded.
In .netrc, the password MUST NOT be URL encoded.

The Docker Exception

For whatever reason, solution 2 (the combination of pip.conf and .netrc) does NOT work with Docker.

Conclusion

Amazon's CDK is excellent, and unless you have a very specific use-case that it doesn't support it really is worth trying out!

Oh! And that Python is Very Nice, but simply isn't nice enough to justify the cost of its tooling.

Friday 3 July 2020

the rights and wrongs of git push with tags

My employer uses tags for versioning builds.

git push && git push --tags

I'm guessing this is standard practice, but we've been running into an issue with our build server - Atlassian's Bamboo - picking up commits without their tags. The reason was obvious: we've been using git push && git push --tags, pushing tags after the commits, and Bamboo doesn't trigger builds on tags, only on commits. This means that every build would be versioned with the previous version's tag!

The solution should have been obvious, too: push tags with the commits. Or before the commits? This week, we played around with an experimental repo and learned the following:

git push --tags && git push

To the uninitiated (like us), the result of running git push --tags can be quite surprising! We created a commit locally, tagged it*, and ran git push --tags. This resulted in the commit being pushed to the server (Bitbucket, in our case) along with its tag, but the commit was rendered invisible. Not even git ls-remote --tags origin would return it, and it was not listed under the commits on its branch, although it showed up with its commit on Bitbucket's feature to search commits by tag quite clearly:



* All tags are annotated, automatically, courtesy of SourceTree. If you're not using SourceTree then you should be, and I promise you I'm not being paid to say that. If you must insist on the barbaric practice of using the command line, just add -a (and -m to specify the message inline if you don't want the editor to pop up, check out the documentation for more details).

Pushing a tag first isn't the end of the world - simply pushing the commit with git push afterwards puts everything in order - unless someone else pushes a different commit first. At that point the original commit will need to be merged, with merge conflicts to be expected. Alternatively - and relevant for us - any scripts that perform commits automatically might fail between pushing the tags and their commits, leading to "lost" commits.

git push --tags origin refs/heads/develop:refs/heads/develop

This ugly-to-type command does what we want it to do: it pushes the commit and its tags together. From the documentation:

When the command line does not specify what to push with <refspec>... arguments or --all, --mirror, --tags options, the command finds the default <refspec> by consulting remote.*.push configuration, and if it is not found, honors push.default configuration to decide what to push (See git-config[1] for the meaning of push.default).

Okay, so that works. But there's no way I'm typing that each time, or asking any of my coworkers to. I want English. I'm funny that way.

git push --follow-tags

Here we go: the real solution to our problems. As long as the tags are annotated - which they should be anyway, see the earlier footnote on tagging - running git push --follow-tags will push any outstanding commits from the current branch along with their annotated tags. Any tags not annotated, or not attached to a commit being pushed, will be left behind.

Resolved!

Monday 22 June 2020

A templated guide to AWS serverless development with CDK

If all you're looking for is a no-frills, quick, step-by-step guide, just scroll on down to The Guide!


The Story

Or, how I learned to let go of my laptop and embrace The Cloud...


Motivation


A while ago I outlined a project that I intend to build, one that I'm rather enthusiastic about, and after spending a few hours evaluating my options I decided that an AWS solution was the right way to go: their options are configurable and solid, their pricing is very reasonable and, as an added bonus, I get to familiarize myself with tech I'll be using for my current contract!

All good.

Once I'd made the decision to use AWS, becoming generally familiar with the products on offer was easy enough and it took me very little time to flesh out a design. I was ready to get coding!

Or so I thought...


Hurdle 1

Serverless. And possibly other frameworks, I'm pretty sure I looked at a few but it seems like forever ago. Serverless shows lots of promise, and I know people who swear by it, but it's only free up to a point and I immediately ran into strange issues with my first deployment. I found the interface surprisingly unhelpful, and it looks like once you're using it you're somewhat locked in.

Pass.


Hurdle 2

Cognito. Security first. Cognito sounds like a great solution, but once I'd gotten a handle on how it works I was severely disappointed by how limiting it is, and getting everything set up takes real developer and user effort (and I suffer enough from poor interface fatigue). After playing around with user pools and looking at various code samples, I realized that I'd rather allow users to register using their email addresses / phone numbers as 2nd-factor authentication (mailgun and twilio are both straightforward options), or use oauth providers like Facebook, Google and GitHub, and I certainly want to encourage my users to use strong, memorable passwords (easily enforced with zxcvbn, and why is this still a thing?!) which Cognito doesn't allow for.

You'll need to configure Lambda authorizers either way, so I really don't think Cognito adds much value.


Hurdle 3

Lambda / DynamoDB. Okay, so writing a lambda function is really easy, and guides and examples for code that reads to and writes from DynamoDB abound. Great! Except for the part where you need to test your functions before deploying them.

My first big mistake - and so far it's proved to be the most expensive one - was not understanding at this point that "testing locally" is simply not a feasible strategy for a cloud solution.


Hurdle 4

The first code I wrote for my project was effectively a lambda environment emulator to test my lambda functions. It was far from perfect, and it did take me a couple of hours to cobble together, but it did the job and I used it to successfully test lambda functions against DynamoDB running in Docker.


Hurdle 5

Lambda Layers. Why do most guides not touch on these? Why are there so few layers guides written for Javascript? It took me a little while to get a handle on layers and build a simple script to create them from package.json files, but as far as hurdles go this was a relatively short one.


Hurdle 6

Deployment. It's nice to have code running locally, but uploading it to the cloud? Configuring API Gateway was a mixed bag of Good Interface / Bad Interface, same with the Lambda functions, and what eventually stumped me was setting up IAM to make everything play nicely together. What's the opposite of intuitive? Not counter-intuitive, in this case, as I don't feel that that word evokes nearly enough frustration.

Anyway, it became abundantly clear at that point that manual deployment of AWS configurations and components is not a viable strategy.


Hurdle 7

A coworker introduced me to two tools that could supposedly Solve All My Problems: CDK and SAM. This seemed like a worthy rabbit-hole to crawl into, but I couldn't find any examples of the two working together!

I began to build my own little framework, one that would allow me to configure my stack using CDK, synthesize the CloudFormation templates, and test locally using SAM. Piece by piece I put together this wonderful little tool, hour by hour, first handling DynamoDB, then Lambda functions, then Lambda Layers...

It was at that point that realization dawned: not only are SAM and CDK not interoperable by design, but SAM does not, in fact, provide meaningful "local" testing. Sure, you can invoke your lambda functions on your local machine, but the intention is to invoke them against your deployed cloud infrastructure. Once I got that through my head, it was revelation time: testing in the cloud is cheaper and better than testing locally.


The Guide

If you're like me, and you intend your first foray into cloud computing to be simple, yet reasonably production-ready, CDK is the easiest way forward and it's completely free (assuming you don't factor in your time figuring it out as valuable, but that's what I'm here for!).

Over the course of the past couple of weeks, I've put together the aws-cdk-js-dev-guide. It's a work in progress (next stop, lambda authenticators!), but at the time of writing this guide it's functional enough to put together a simple stack that includes Lambda Layers, DynamoDB, functions using both of those, API Gateway routes to those functions and the IAM permissions that bind them.

And that's just the tip of the CDK iceberg.


Step 1 - Tooling

It is both valuable and necessary to go through the following steps prior to creating your first CDK project:

  • Create a programmatic user in IAM with admin permissions
  • If you're using Visual Studio Code (recommended), configure the aws toolkit
  • Set up credentials with the profile ID "default"
  • Get your 12 digit account ID from My Account in the AWS console
  • Follow the CDK hello world tutorial


Step 2 - Creating a new CDK project

The first step to creating a CDK project is initializing it with cdk init, and a CDK project cannot be initialized if the project directory isn't empty. If you would like to use an existing project (like aws-cdk-js-dev-guide) as a template, bear in mind that you will have to rename the stack in multiple locations and it would probably be safer and easier to create a new project from scratch and copy and paste in whatever bits you need.


Step 3 - Stack Definition

There are many viable approaches to setting up stages, mine is to replicate my entire stack for development, testing and production. If my stack wasn't entirely serverless - if it included EC2 or Fargate instances, for example - then this approach might not be feasible from a cost point of view.

Stack definitions are located in the /lib folder, and this is where the stacks and all their component relationships are configured. Here you define your stacks programmatically, creating components and connecting them.



I find that the /lib folder is a good place to put your region configurations.



Once you have completed this, the code to synthesize the stack(s) is located in the /bin folder. If you intend, like I do, to deploy multiple replications of your stack, this is the place to configure that.



See the AWS CDK API documentation for reference.


Step 4 - Lambda Layers

I've got much more to learn about layers at the time of writing, but my present needs are simple: the ability to make npm packages available to my lambda functions. CDK's Code.fromAsset method requires the original folder (as opposed to an archive file, which is apparently an option), and that folder needs to have the following internal structure:

.
.nodejs
.nodejs/package.json
.nodejs/node_modules/*

You can manually create this folder anywhere in your project, maintain the package.json file and remember to run npm install and npm prune every time you update it... or you can just copy in the build-layers script, maintain a package.json file in the layers/src/my-layer-name directory and run the script as part of your build process.


Step 5 - Lambda Functions

There are a number of ways to construct lambda functions, I prefer to write mine as promises. There are two important things to note when putting together your lambda functions:

1. Error handling: if you don't handle your errors, lambda will handle them for you... and not gracefully. If you do want to implement your lambda functions as promises, as I have, try to use "resolve" to return your errors.

2. Response format: your response object MUST be in the following structure:


This example lambda demonstrates using a promise to return both success and failure responses:



Step 6 - Deployment

There are three steps to deploying and redeploying your stacks:
  1. Build your project
    If you're using Typescript, run
    > tsc
    If you're using layers, run the build-layers script (or perform the manual steps)
  2. Synthesize your project
    > cdk synth
  3. Deploy your stacks
    > cdk deploy stack-name
    Note: you can deploy multiple stacks simultaneously using wildcards
At the end of deployment, you will be presented with your endpoints. Unless your lambda has its own routing configured - see the sample dynamodb API examples that follow - simply make your HTTP requests to those URLs as is.


    Sample dynamodb API call examples:
  • List objects
    GET https://xxxxxxxxxx.execute-api.us-east-1.amazonaws.com/prod/objects
  • Get specific object
    GET https://xxxxxxxxxx.execute-api.us-east-1.amazonaws.com/prod/objects/b4e01973-053c-459d-9bc1-48eeaa37486e
  • Create a new object
    POST https://xxxxxxxxxx.execute-api.us-east-1.amazonaws.com/prod/objects
  • Update an existing object
    POST https://xxxxxxxxxx.execute-api.us-east-1.amazonaws.com/prod/objects/b4e01973-053c-459d-9bc1-48eeaa37486e

Step 7 - Debugging

Error reports from the API Gateway tend to hide useful details. If a function is not behaving correctly or is failing, go to your CloudWatch dashboard and find the log group for the function.

Step 8 - Profit!!!

I hope you've gotten good use of this guide! If you have any comments, questions or criticism, please leave them in the comments section or create issues at https://github.com/therightstuff/aws-cdk-js-dev-guide.

Saturday 28 March 2020

An improved (fairer) playlist shuffling algorithm

Lots of people find playlist shuffling insufficiently random for a variety of reasons, some of which have been addressed by the industry.

There's one aspect that my wife and I haven't seen, though, and that's making sure that no songs get "left behind" whenever a playlist is reshuffled, whether intentionally or by switching back-and-forth between playlists.

In an attempt to sow seeds, I've just put together an example of an improvement that can easily be applied to any of the existing shuffle algorithms.

Tuesday 24 March 2020

The value of git (re)parenting

EDIT: I have subsequently learned about git merge --no-ff and no longer re-parent. But I'll leave this up in case someone finds it useful anyway.


I'm not a command-line person, I may have grown up with it but... let's just say I've developed an allergy. I want GUIs, I want simplicity and I want visual corroboration that I'm doing what I think I'm doing. I really don't see why I need to become an expert in every tool I use before it can be functional.

This is why I adore SourceTree, and (last time I used it) GitEye. Easier, more intuitive interface, and visualizations to help you see whether what you're doing makes sense. You're less likely to make mistakes!

I'm also (and for similar reasons) a huge fan of squashed merges in Git; one commit per feature / bugfix / hotfix, and (theoretically) the ability to go through the individual commits on my own time if I ever need to. I use interactive rebasing* a lot to achieve a similar result, but getting other devs to use it responsibly is a lot more trouble than teaching them to add --squashed to the merge command. The downside is that because I've become used to interactive rebasing to squash my commits before merging, I've also become used to using regular merges and seeing a neat line on the branch graph showing me where my merges originated. Squashed commits don't give you that. And that's sad. It's also problematic - this morning I freaked out because I thought the code in a branch I'd squash-merged into wasn't where it needed to be, all because I was relying on those parenting lines and didn't immediately think to do a diff**.

* Ironically, from the command-line. It's the one thing I find less intuitive and more risky using SourceTree.

** While we're talking about branch diffs, if you're using Bitbucket don't trust the branch diff - the UI uses a three-dot diff and what you actually want is a two-dot, ie git diff branch1..branch2

So, after much surfing around the internets and learning lots more about git's inner workings than I care to, I came across an elegant little solution and have wrapped it with a bash script in the hopes that it'll be found useful. All you have to do is run this script as follows:

./add_parent.sh TARGET_COMMIT_ID NEW_PARENT_COMMIT_ID

Sunday 26 January 2020

Lessons from my first Direct-to-Print experience

The direct-to-print paperback edition of Shakespeare's Sonnets Exposed: Volume 1 is now up for review, after I finally ironed out the formatting kinks last night and finished fiddling with the cover around 2am. So now that I've had a chance to sleep on it, and re-re-re-review everything before submitting it, I have a few notes for anyone who wants to self-publish a book on Kindle.

1. Apple's Pages is a fantastic tool for a number of reasons, but what it produces by default really isn't great for reflowable epubs or paperback formats, which are both important if you want your work to be accessible, readable and attractive to your readers. It's a good place to start, though, just as Microsoft Word is, because it exports to Word and once you have a Word doc you can then import your work into Amazon's Kindle Creator.

2. I wish I'd begun with Kindle Creator, even though my intention is to publish on other platforms as well. Kindle is not only the easiest platform to get started on - and is probably the most accessible for your audience - but from a formatting perspective is effectively the lowest common denominator: it's tough to use custom fonts for Kindle publications, and I suspect they've made it so intentionally in order to standardize the reading experience.

My advice is to sort out the formatting for the ebook, publish it (see step 5), convert it to EPUB for other platforms, then tweak the formatting (and possibly content) for print publishing.

3. Don't bother adding your ISBN barcodes to the books yourself. Even if you have one issued for your paperback, it's best to use the KDP generated one for the Kindle direct-to-paperback offering and reserve any others for paperbacks where you have to add the code to the cover manually.

4. You can download cover templates here. I didn't realize that and I made my own, which in retrospect was silly.

5. After you've "published" your book, you'll have a KPF file that you can upload to KDP. From Converting KPF to EPUB format:

I recently managed to successfully convert kpf to epub format using jhowell's KFX conversion plugin for Calibre. Just install the plugin and use drag-and-drop to load your kpf file into Calibre. Then convert the kpf file to epub in the normal way using Calibre. Save your new epub to your desktop and then run Epubcheck on it to ensure that it is a valid epub(it always passes).

If you run into any issues with these suggestions, please let me know in the comments!

Monday 20 January 2020

ISBN codes for Dummies

Step 1

Acquire ISBN codes. For South African residents this is a free service (thank you, NLSA!), and all you have to do is request them and assign them (there's really no need to pay anyone any money, simply look up contact details on their website and call or email until you reach someone).

It's important to note that not only does each format (paperback, hardcover, etc) need its own ISBN, but each e-book format (eg. epub, mobi, PDF) does as well!

IMPORTANT UPDATE: I've recently been informed that eBookFairs.com has, in addition to a really cool concept, an excellent free online barcode generator that I've now tried out. It's far less effort than what I originally published, and takes care of both steps 2 and 3 below.
(NOTE: use the full ISBN including the check digit, see below for a detailed explanation).

Step 2

Generate the actual barcode using the ISBN 13 section of this free online generator. As explained here:
Before making an ISBN barcode, the user must first apply for an ISBN number. This number should be 10 or 13 digits, for example 0-9767736-6-X or 978-0-9767736-6-5. Once the ISBN number is obtained, it should be displayed above the barcode on the book. All books published after January 1, 2007 must display the number in the new 13-digit format, which is referred to as ISBN-13. Older 10 digit numbers may be converted to 13 digits with the free ISBN conversion tool.

The last digit of the ISBN number is always a MOD 11 checksum character, represented as numbers 0 through 10. When the check character is equal to 10, the Roman numeral X is used to keep the same amount of digits in the number. Therefore, the ISBN of 0-9767736-6-X is actually 0-9767736-6 with a check digit of 10. The ISBN check digit is never encoded in the barcode.
Simply remove the hyphens (dashes) and the check digit from your ISBN, paste it into the text box and hit "refresh". I recommend changing the image settings to PNG format with 300 DPI. You can also change the colors if you wish.

Step 3

You now have ISBNs and their barcodes, but for a professional look you'll want to right barcode title in the right font. That's as simple as adding ISBN 0-9767736-6-X above the barcode image. The free generator uses the Arial font, but the more traditional font is monospace.