Working with Microsoft Azure for 20 hours and why I will not use it again
Last weekend I attended a Hackathon at Microsoft. Overall it was an awesome experience and I had a lot of fun, so this post has nothing to do with the event itself and also is not my overall opinion on Microsoft. They do awesome stuff in a lot of fields, but with Azure, they are definitely under delivering.
During the event, I started to get in touch with the Azure platform. Our project Idea was to create a website where you can search for news and then via sentiment analysis this news would be sorted by “happiness”. The news search and sentiment analysis are offered via Azures so-called cognitive services that abstract the ML models away and you can simply use an API for accessing that services....so far so good. With this preconditions most of you coders out there will have the thought: “This sounds too easy to fill 24h of programming”. Exactly what I thought...already thinking about also coding an Alexa skill and so on to fill the time. With two experienced developers, we thought the backend would be done in about 4h (conservative calculation) as it would only be stitching together three APIs and delivering that info to a JSON REST API for our frontend team. For keeping the fun up and having more learnings during the project we decided to do the backend as a serverless function. But then Azure came into our way...
So for sure we did things wrong and it could have been done faster, but this blog post is about the hard onboarding and overall bad structure of Azure. Please also keep in mind that Azure is still in a more-or-less early stage and not everything is wrong in there. In the following, I will walk you through the timeline of this disaster and suggestions I would have in mind to fix some of the most confusing steps. Actually, I will try to avoid these mistakes in my own future projects, so thanks Microsoft by showing me a way how not to do things xD
Just a bit more background: My partner in the backend had some experience with GCP and I do most of my current projects with AWS, so we did know how stuff works there...couldn't be too hard to transfer that knowledge to the Azure platform.
Start of the project
So first of all creating a new Azure account, that is not that hard and after entering credit card info you get 100$ of free credit. I actually like how Microsoft solved that here: You have two plans. You start with the 100$ free tier and if you spend all of that money you manually have to change to the pay-as-you-go plan. So that protects you of opening up an account, doing some testing, forgetting about it and then a month later you get a huge bill (happened to me with AWS). So that is nice for protecting new users that just start to test the system. Good job here Microsoft!
After setting up the account I created a new project and added some of the resources we needed. Creating a serverless function I recognized the tag “(Preview)” on the function I created but didn't think more about it...but actually, that sign should be something like Experimental/Do not use/Will most likely not work properly. We created a Python serverless function (apparently Python functions are still beta here) and tried to get some code in there.
There are three ways to get code into an azure function:
- Web “editor”
- Azure CLI
- VS Code
...for full-featured functions. As we selected the experimental/beta/preview functionality Python we only had the latter two options. Not that bad as it is the same for AWS and I am used to deploying my code via the AWS cmd...shouldn't be way harder with Azure.
My suggesting: Do not do publish functionality that is obviously not ready yet. Do internal testing instead of using your users for that task.
Azure plugins for VS code
Microsoft overs a wide range of VS code plugins for Azure. As that is my main editor anyways I wanted to give them a try. So for the functionality of serverless functions, you need the functions plugin and about 9 other mandatory ones that are some sort of base plugins. 500mb and three VS Code crashed later finally the required plugins were installed properly. The recommended login method did not work and I had to choose the method via the browser. Not that big of a deal, but as they recommend the inline method I would think that should work. (Didn't work for the other folks in my team as well...so had nothing to do with my particular machine)
You would think that 500mb should be enough for finally being able to deploy some code...but you still need 200mb more for the Azure cli that is required for the plugins to work properly.
Finally having installed all of it you can see all your Azure functions and resources in VS code. I started to get a bit excited as it looked that the development from now on would be straight forward and easier as I am used to from AWS.
But that 700mb of code did not work properly....the most important function “deploy” did fail without a more in-depth error message...AAAAAAARRRG. Why do I have to install all that crap and then it can't do the most simple task it has to get my code into their cloud.
Keep your tooling modular and try to do fewer things, but do them right
A nice idea is, on creating a new serverless function Azure greets you with a basic boilerplate code example showing you how to handle the basic data interfaces.
Better no boilerplate code than one in the wrong programming language
But at least it is colorful
So next try with the Azure CLI. The first thing that you recognize is that the CLI has all sorts of different colors...but that does not help if you are annoyed and want to get things done.
That is a thing you also recognize in the Azure web interface...it has quite some UX issues but they do have more than five color themes you can choose from for styling the UI...not sure Microsoft if you set your priorities right here ;)
Also, the CLI did not get us where we wanted....either of our own incompetence or the CLI, no clue. Either way, I would blame Azure as it is their job to help developers onboarding and at least get basic tasks (we still only want to deploy a simple “hello world”) done in an acceptable time.
Focus less on making your UI shine in every color of the rainbow and try to improve documentation and onboarding examples
Full ownership of a resource still does not give you full privileges
After finally being able to deploy at least the “hello world” we wanted to go a step further...work concurrently on that project. Yes 'till now we mainly did pair programming on a single machine.
As I was the owner of that resource I wanted to give my teammate also full access to it, so he could work on the resource and add functions if required. I granted him “owner” access rights (the highest that were available) but still, he was not able to work properly with that function. In the web UI it did work more or less but than again in VS code no chance to do anything (adding a function or deploying it). I ended up doing a thing that goes against everything I learned about security: I logged in with my credentials at his machine.
So imagine yourself now already sitting in front of your laptop for about 4 ½ hours and you did not manage to do anything of your real work.
Ditching Azure Functions and switching to GCP
That was the moment when we did ditch the idea of doing the backend as Azure function. We switched to GCP and started there all over again. As I also never worked with that platform I expected a similar hard start as I already had in the last few hours with Azure. So about 25 minutes later we achieved more on GCP than with azure 'till then.
A thing both Azure and GCP do better than AWS is they have the Logs of a serverless function in the same window as the function itself. AWS has here a different approach and you have to change to the cloud logs when you want to get info about your function and how it worker. Props to both Google and Microsoft for solving this a lot better!
Actually a hint for AWS: Give your user all control and info at a single place
The prices you could win at the Hackathon were attached to using Azure and thereby we stick to the cognitive services for doing the news search and the sentiment analysis. Overall the API is straight forward: Send your data and get the results back.
One thing we got told in a presentation and that you should keep in mind when using the cognitive services: You do not control the model and it could change at any moment in time. So if you use the cognitive services for productive use, you should continuously check that the API didn't change its behavior in a way that influences your product in a bad way. But most of the time it is still a lot cheaper and better than building the model yourself
The problem that we did have with the services where again authentication issues. Quite confusing some of the cognitive services (e.g. the sentiment analysis) are have different API base URLs depending where you register that cognitive service and others are not. As I assume they need that manual setting of data centers for a particular (unknown to me) reason. Indeed I would propose to have all the cognitive services bound to a location.
The news search, for example, is not bound to a location and so we had two different behaviors of the API base URLs in our so short and easy application:
- One URL for all location.
- Only a certain location is valid for your resource. If you point to a wrong API location you get an “unauthorized” as the response
Pointing to the wrong location is pure incompetence on the developer side but it would help a lot if there would be a distinct error code/message for that scenario.
Have the same base URL behavior for all cognitive services
Return some sort of 'wrong location'-error if you have a valid API token but you are pointing to the wrong location
Insufficient documented SDKs
Azure offers SDKs for using their services. We gave the JS SDK for the cognitive services a try. Here we stumbled upon two sides of a medal: First props to the developers coding that SDKs they are straight forward and do what they should. Even the code itself looks good...but why the hell do I have to look into the code of the SDKs to get all the options the functions offer? When you stick to the documentation provided via the GitHub readme or NPM you only get a fraction of the functionality. We were confused that the own SKDs of Microsoft seemed not to be API complete. Looking into the code we saw they are actually API complete and do offer a lot more options then documented.
Please Microsoft: Properly document your functionalities!
IMO there must be deep problems with the internal release processes at Azure. It is not acceptable that an IT company already being so long in the industry allows itself such a standard mistake. You should not release your products (and I see the SDKs as such) without proper documentation.
During our try and error period for trying to get the JS SDK running, we stumbled upon the quickstart guide for the cognitive services Quickstart: Analyze a remote image using the REST API with Node.js in Computer Vision
Instead of using their own SDK and explaining how to use it they show you how to manually build an HTTP request in JS. Sure that can be helpful for new JS coders, but if you have an SDK for that particular reason...why are you not using it? Looks like the left hand is not knowing what the right-hand does.
Stick to one way of doing things. If you have an SKD, also use it in your quickstart guides for being consistent
In the end, we did port the code back from GCP to an Azure function (again ~1h of work). We selected JS instead of Python and coded completely in the web UI...that did work. I now know how real Microsoft business developers do their daily business...never leave the web UI and just accept that life is hard.
Microsoft failed to deliver a good enough experience here and lost me as a potential customer. How can it be that I was able to do the same stuff in a fraction of the time in GCP? (And keep in mind: it was already 3am in the morning, I was super tired and I also never worked with GCP before)
None of the three major players is perfect and sure I understand it is hard to deliver fast and keeping good quality in this highly competitive market. But maybe actually going the step further will help to win in the end.
Once again: This is me only rating the onboarding experience of Azure in particular! No general opinion on Microsoft.
Last one: The Azure web UI didn't work in Chrome. So if you have issues with that, Firefox did the trick for us ;)