## Neural network basics–Activation functions

Neural networks have a very interesting aspect – they can be viewed as a simple mathematical model that define a function. For a given function $f(x)$ which can take any input value of $x$, there will be some kind a neural network satisfying that function. This hypothesis was proven almost 20 years ago (“Approximation by Superpositions of a Sigmoidal Function” and “Multilayer feedforward networks are universal approximators”) and forms the basis of much of #AI and #ML use cases possible.

It is this aspect of neural networks that allow us to map any process and generate a corresponding function. Unlike a function in Computer Science, this function isn’t deterministic; instead is confidence score of an approximation (i.e. a probability). The more layers in a neural network, the better this approximation will be.

In a neural network, typically there is one input layer, one output layer, and one or more layers in the middle. To the external system, only the input layer (values of $x$), and the final output (output of the function $f(x)$) are visible, and the layers in the middle are not and essentially hidden.

Each layer contains nodes, which is modeled after how the neurons in the brain works. The output of each node gets propagated along to the next layer. This output is the defining character of the node, and activates the node to pass on its value to the next node; this is very similar to how a neuron in the brain fires and works passing on the signal to the next neuron.

To make this generalization of function $f(x)$ outlined above to hold, the that function needs to be continuous function. A continuous function is one where small changes to the input value $x$, creates small changes to the output of $f(x)$. If these outputs, are not small and the value jumps a lot then it is not continuous and it is difficult for the function to achieve the approximation required for them to be used in a neural network.

For a neural network to ‘learn’ – the network essentially has to use different weights and biases that has a corresponding change to the output, and possibly closer to the result we desire. Ideally small changes to these weights and biases correspond to small changes in the output of the function. But one isn’t sure, until we train and test the result, to see that small changes don’t have bigger shifts that drastically move away from the desired result. It isn’t uncommon to see that one aspect of the result has improved, but others have not and overall skewing the results.

In simple terms, an activation function is a node that attached to the output of a neural network, and maps the resulting value between 0 and 1. It is also used to connect two neural networks together.

An activation function can be linear, or non-linear. A linear isn’t terribly effective as its range is infinity. A non-linear with a finite range is more useful as it can be mapped as a curve; and then changes on this curve can be used to calculate the difference on the curve between two points.

There are many times of activation function, each either their strengths. In this post, we discuss the following six:

• Sigmoid
• Tanh
• ReLU
• Leaky ReLU
• ELU
• Maxout

1. Sigmoid function

A sigmoid function can map any of input values into a probability – i.e., a value between 0 and 1. A sigmoid function is typically shown using a sigma ($\sigma$). Some also call the ($\sigma$) a logistic function. For any given input value, $x$ the official definition of the sigmoid function is as follows:

$\sigma(x) \equiv \frac{1}{1+e^{-x}}$

If our inputs are $x_1, x_2,\ldots$, and their corresponding weights are $w_1, w_2,\ldots$, and a bias b, then the previous sigmoid definition is updated as follows:

$\frac{1}{1+\exp(-\sum_j w_j x_j-b)}$

When plotted, the sigmoid function, will look plotted looks like this curve below. When we use this, in a neural network, we essentially end up with a smoothed out function, unlike a binary function (also called a step function) – that is either 0, or 1.

For a given function, $f(x)$, as $x \rightarrow \infty$, $f(x)$ tends towards 1. And, as as $x \rightarrow -\infty$, $f(x)$ tends towards 0.

And this smoothness of $\sigma$ is what will create the small changes in the output that we desire – where small changes to the weights ($\Delta w_j$), and small changes to the bias ($\Delta b$) will produce a small changes to the output ($\Delta output$).

Fundamentally, changing these weights and biases, is what can give us either a step function, or small changes. We can show this as follows:

$\Delta \mbox{output} \approx \sum_j \frac{\partial \, \mbox{output}}{\partial w_j} \Delta w_j + \frac{\partial \, \mbox{output}}{\partial b} \Delta b$

One thing to be aware of is that the sigmoid function suffers from the vanishing gradient problem – the convergence between the various layers is very slow after a certain point – the neurons in previous layers don’t learn fast enough and are much slower than the neurons in later layers. Because of this, generally a sigmoid is avoided.

2. Tanh (hyperbolic tangent function)

Tanh, is a variant of the sigmoid function, but still quite similar – it is a rescaled version and ranges from –1 to 1, instead of 0 and 1. As a result, its optimization is easier and is preferred over the sigmoid function. The formula for tanh, is

$\tanh(x) \equiv \frac{e^x-e^{-z}}{e^X+e^{-x}}$

Using, this we can show that:

$\sigma(x) = \frac{1 + \tanh(x/2)}{2}$.

Tanh also suffers from the vanishing gradient problem. Both Tanh, and, Sigmoid are used in FNN (Feedforward neural network) – i.e. the information always moves forward and there isn’t any backprop.

3. Rectified Linear Unit (ReLU)

A rectified linear unity (ReLU) is the most popular activation function that is used these days.

$\sigma(x) = \begin{cases} x & x > 0\\ 0 & x \leq 0 \end{cases}$

ReLU’s are quite popular for a couple of reasons – one, from a computational perspective, these are more efficient and simpler to execute – there isn’t any exponential operations to perform. And two, these doesn’t suffer from the vanishing gradient problem.

The one limitation ReLU’s have, is that their output isn’t in the probability space (i.e. can be >1), and can’t be used in the output layer.

As a result, when we use ReLU’s, we have to use a softmax function in the output layer.  The output of a softmax function sums up to 1; and we can map the output as a probability distribution.

$\sum_j a^L_j = \frac{\sum_j e^{z^L_j}}{\sum_k e^{z^L_k}} = 1.$

Another issue that can affect ReLU’s is something called a dead neuron problem (also called a dying ReLU). This can happen, when in the training dataset, some features have a negative value. When the ReLU is applied, those negative values become zero (as per definition). If this happens at a large enough scale, the gradient will always be zero – and that node is never adjusted again (its bias. and, weights never get changed) – essentially making it dead! The solution? Use a variation of the ReLU called a Leaky ReLU.

4. Leaky ReLU

A Leaky ReLU will usually allow a small slope $\alpha$ on the negative side; i.e that the value isn’t changed to zero, but rather something like 0.01. You can probably see the ‘leak’ in the image below. This ‘leak’ helps increase the range and we never get into the dying ReLU issue.

5. Exponential Linear Unit (ELU)

Sometimes a ReLU isn’t fast enough – over time, a ReLU’s mean output isn’t zero and this positive mean can add a bias for the next layer in the neural network; all this bias adds up and can slow the learning.

Exponential Linear Unit (ELU) can address this, by using an exponential function, which ensure that the mean activation is closer to zero. What this means, is that for a positive value, an ELU acts more like a ReLU and for negative value it is bounded to -1 for $\alpha = 1$ – which puts the mean activation closer to zero.

$\sigma(x) = \begin{cases} x & x \geqslant 0\\ \alpha (e^x - 1) & x < 0\end{cases}$

When learning, this derivation of the slope is what is fed back (backprop) – so for this to be efficient, both the function and its derivative need to have a lower computation cost.

And finally, there is another various of that combines with ReLU and a Leaky ReLU called a Maxout function.

So, how do I pick one?

Choosing the ‘right’ activation function would of course depend on the data and problem at hand. My suggestion is to default to a ReLU as a starting step and remember ReLU’s are applied to hidden layers only. Use a simple dataset and see how that performs. If you see dead neurons, than use a leaky ReLU or Maxout instead. It won’t make sense to use Sigmoid or Tanh these days for deep learning models, but are useful for classifiers.

In summary, activation functions are a key aspect that fundamentally influence a neural network’s behavior and output. Having an appreciation and understanding on some of the functions, is key to any successful ML implementation.

## Netron – deep learning and machine learning model visualizer

I was looking at something else and happen to stumble across something called Netron, which is a model visualizer for #ML and #DeepLearning models. It is certainly much nicer than for anything else I have seen. The main thing that stood out for me, was that it supports ONNX , and a whole bunch of other formats (Keras, CoreML), TensorFlow (including Lite and JS), Caffe, Caffe2, and MXNet. How awesome is that?

This is essentially a cross platform PWA (progressive web app), essentially using Electron (JavaScript, HTML5, CSS) – which means it can run on most platforms and run-times from just a browser, Linux, Windows, etc. To debug it, best to use Visual Studio Code, along with the Chrome debugger extension.

Below is a couple of examples, of visualizing a ResNet-50 model – you can see both the start and the end of the visualization shown in the two images below to get a feel of things.

Start of ResNet-50 Model

End of ResNet-5o model

And some of the complex model seem very interesting. Here is an example of a TensorFlow Inception (v3) model.

And of course, this can get very complex (below is the same model, just zoomed out more).

I do think it is a brilliant, tool to help understand the flow of things, and what can one do to optimize, or fix. Also very helpful for folks who are just starting to learn and appreciate the nuances.

## Machine learning use-cases

Someone recently asked me, what are some of the use cases / examples of machine learning. Whilst, this might seem as an obvious aspect to some of us, it isn’t the case for many businesses and enterprises – despite that they uses elements of #ML (and #AI) in their daily life – as a consumer.

Whilst, the discussion gets more interesting based on the specific domain and the possibly use cases (of course understanding that some might not be sure f the use case – hence the question in the first place). But, this did get me thinking and wanted to share one of the images we use internally as part of our training that outcomes some of the use cases.

These are not 1:1 and many of them can be combined together to address various use cases – for example a #IoT device sending in a sensor data, that triggers a boundary condition (via a #RulesEngine), that in addition to executing one or more business rule, can trigger a alert to a human-in-the-loop (#AugmentingWorkforce) via a #DigitalAssistant (say #Cortana) to make her/him aware, or confirm some corrective action and the likes. The possibilities are endless – but each of these elements triggered by AI/ML and still narrow cases and need to be thought of in the holistic picture.

## Synthetic Sound

Trained a model to create a synthetic sound that sounds like me. This is after training it with about 30 sentences – which isn’t a lot.

To create a synthetic voice, you enters some text, using which is then “transcribed” using #AI and your synthetic voice is generated. In my case, at first I had said AI, which was generated also as “aeey” (you can have a listen here). So for the next one, changed the AI to Artificial Intelligence.

One does need to be mindful of #DigitalEthics, as this technology improves further. This is with only a very small sampling of data. Imagine what could happen, with public figures – where their recordings are available quite easily in the public domain. I am thinking the ‘digital twang’ is one of the signatures and ways to stamp this as a generated sound.

## My self-driving car

Over the last few weeks, I built a self-driving car – which essentially is a remote control Rx car that uses a raspberry pi running Python, TensorFlow implementing a end-to-end convolution neural network (CNN)

Of course other than being  a bit geeky, I do think this is very cool to help understand and get into some of the basic constructs and mechanics around a number of things – web page design, hardware (maker things), and Artificial Intelligence principles.

There are two different models here – they do use the same ASC and controller that can be programmed. My 3D printer, did mess up a little (my supports were a little off) and which is why you see the top not clean.

The sensor and camera are quite basic, and there is provisions to add and do better over time. The Pi isn’t powerful enough to train the model – you need another machine for that (preferably a I7 core with a GPU). Once trained you can run the model on the Pi for inference.

This is the second car, which is a little different hardware, but the ESC to control the motor and actuators are the same.

The code is simple enough; below is an example of the camera (attached) to the Pi, saving the images it is seeing. Tubs is the location where the images are saved; these can then be transferred to another machine for training or inference.

import donkey as dk

#initialize the vehicle
V = dk.Vehicle()

cam = dk.parts.PiCamera()

#add tub part to record images
tub = dk.parts.Tub(path='~/d2/gettings_started',
inputs=['image'],
types=['image_array'])

#start the vehicle's drive loop
V.start(max_loop_count=100)


Below you can see the car driving itself around the track, where it had to be trained first. The reason it is not driving perfectly is because during training (when I was manually driving it around), I crashed a few times and as a result the training data was messed up. Needed more time to clean that up and retrain it.

This is based on donkey car – which is an open source DIY for platform for small-scale self driving cars. I think it is also perfect to get into with those who have teenagers and a little older kids to get in and experiment. You can read up more details on how to go about building this, and the parts needed here.

## Cloud and failure

Despite all the cloud talk and where I live, it is like the cloud mecca, for enterprises it is still quite new and many are just starting to think about it. A hard lesson that many of us learn (and partly how we amass our scars) is to design for failures. For those, who run things in their enterprises data center, are quite spoilt I think. Failures are rare, and if machines or state goes down, moving to another one isn’t really a big deal (of course it is a little more complex, and not to say, there isn’t any down time, or business loss, etc.).

When thinking about a cloud migration (hybrid or otherwise) – a key rule is that you are guaranteed to have failures – at many aspects, and those cannot be exceptional conditions, but rather the normal design and expected behavior. As a result, you app/services/API/whatever needs to be designed for failure. And not only how your loosely couple your architecture to be able to handle these situations, but also, how the response isn’t a binary (yay, or a fancy 404); but rather a degraded experience, where your app/service/API/whatever still performs albeit in a deprecated mode.

Things that can throw one off, and is food for thought (not exhaustive, or on any particular order):

• Managing state (when failures is guaranteed)
• Latency – cloud is fast, but slower than your internal data center; you know – physics. 🙂 How are your REST API’s handling latency, and are they degrading performance?
• “Chatiness” – how talkative, are your things on the wire? And how big is the payload?
• Rollback, or fall forward?
• Lossy transfers (if data structure sizes are large)
• DevOps – mashing up of Developers, and Operations (what some call SRE) – you own the stuff you build, and, responsible for it.
• AutoScale – most think this is to scale up, but it also means to scale down when resources are not needed.
• Physical deployments – Regional deployment vs. global ones – there isn’t a right or wrong answer, it frankly depends on the service and what you are trying to do. Personally, I would lean towards regional first.
• Production deployment strategies – various ways to skin a cat and no one is right or wrong per se (except, please don’t do a basic deployment) – that is suicide. I am use to A/B testing, but also what is now called Blue/Green deployment. Read up more here. And of course use some kind of a deployment window (that works for your business) – this allows you and your team to watch what is going on, and take corrective actions if required.
• Automate everything you can; yes its not free, but you recoup that investment pretty quick; and will still have hair on the scalp!
• Instrument – if you can’t measure it, you can’t fix it.

Again, not an exhaustive list, but rather meant to get one thinking. There are also some inherent assumptions – e.g. automation and production deployment suggests, there is some automated testing in place; and a CI/CD strategy and supporting tools.

Bottom line – when it comes to cloud (or any other distributed architecture), the best way to avoid failure is to fail constantly!

Recently, a few of us went through a workshop where one of the ‘homework’ was to score oneself, on the following 7 aspects – some of these are attributes that allows one to grow from being (hopefully) good managers to great leaders.

In most enterprises, as one grows in their career, managers need to acquire new capabilities – and quickly. What they have, in terms of skills and capabilities and got her or him to this place, won’t be enough for the next step – as the scope and complexity increases it can leave executives underwhelmed. At the core, new executives need support on these seven dimensions that will help them make this transition.

• Specialist to generalist – Understand the mental models, tools, and terms used in key business functions and develop templates for evaluating the leaders of those functions.
• Analyst to Integrator – Integrate the collective knowledge of cross-functional teams and make appropriate trade-offs to solve complex organizational problems.
• Tactician to Strategist – Shift fluidly between the details and the larger picture, perceive important patterns in complex environments, and anticipate and influence the reactions of key external players.
• Bricklayer to Architect – Understand how to analyze and design organizational systems so that strategy, structure, operating models, and skill bases fit together effectively and efficiently, and harness this understanding to make needed organizational changes.
• Problem Solver to Agenda Setter – Define the problems the organization should focus on, and spot issues that don’t fall neatly into any one function but are still important.
• Warrior to Diplomat – Proactively shape the environment in which the business operates by influencing key external constituencies, including the government, NGOs, the media, and investors.
• Supporting Cast Member to Lead Role – Exhibit the right behaviors as a role model for the organization and learn to communicate with and inspire large groups of people both directly and, increasingly, indirectly.

I was surprised on how few people talk about this. These come from an awesome HBR article called How Managers become Leaders, which if you haven’t read, I would highly recommend.

So, what can one do? The suggestions outlined are not rocket science, but something to think about. And fundamentally not that much different on how the armed forces trains new officers.

• Experience on cross-functional projects
• An international assignment
• Exposure to a broad range of business situations – accelerated growth, sustaining success, realignment, turnaround.
• When a high potentials’ leadership promise becomes evident give them:
• A position on a senior management team
• Experience with external stakeholders
• An assignment as chief of staff for an experienced enterprise leader
• An appointment to lead an acquisition integration or a substantial restructuring
• Just before their first leadership promotion:
• Send them to an executive program that addresses capabilities like – organizational design, business process improvement, and transition management.
• That are small, distinct, and thriving
• And are staffed with an experienced and assertive team that they can learn from.

Just as last year, I wrote a PowerShell script using which you can download the PowerPoint decks, and, videos from Microsoft Build’s conference, instead of streaming it (or manually download it one by one). You can choose if you want the decks, or the videos, or both. For the videos you can choose the desired resolution (Low, Medium, High) – of course the higher the resolution, the more space is needed. The script also downloads the description and if there is a session image (if there is one).

A few points to note:

• The slides only once downloaded is ~10GB and with videos (high-resolution), the size goes up to 90.5 GB. So make sure you have enough space.
• By default the download location is “C:\build-2018\”; you can change this to whatever you want, but make sure there is a trailing backslash. Think of this as the ‘base’ folder.
• For each session a sub-folder with the session name will be created in the ‘base’ folder setup in the previous step.
• If a file already exists, it will be skipped.
• As each file is downloaded, it save it in the root folder and once the download is complete, only then moves it in the relevant subfolder.
• If a download fails for some reason to retry it, delete the ‘left over’ file(s) in the base folder and then run the script again. The script itself will ‘eat’ the exception and move on to the next file.
• The video quality parameter is 1 for Low, 2 for Medium, and 3 for High (default).

And if you read through, the script is quite self-explanatory.

# Comments that you should read, before you kick this off. Yes, seriously. 🙂
# 1. Setup the folder where to download using the parameters outlined below
# 2. Loop through and get the slides first
# 3. Finally, loop through and get the videos last

param (
[string]$path = "C:\build-2018\", [switch]$sessionvideo = $true, [int][ValidateRange(1,3)]$videoquality = 3,
[switch]$sessiondeck =$true
)

[Environment]::CurrentDirectory=(Get-Location -PSProvider FileSystem).ProviderPath
$rss = (new-object net.webclient) #Filenames might get long, so keep this short! #$downloadlocation = "D:\build-2018"

if (-not (Test-Path $path)) { Write-Host "Folder$fpath dosen't exist. Creating it..."
New-Item $path -type directory } set-location$path

if($sessiondeck) { # Grab the Slides RSS feed - Build 2018$slides = ($rss.downloadstring("http://s.ch9.ms/events/build/2018/rss/slides")) # ********** download the decks ********** try { foreach($item in $slides.rss.channel.item) {$code = $item.comments.split("/") | select -last 1 # Get the url for the pptx file$urlpptx = New-Object System.Uri($item.enclosure.url) # make the filename readable$filepptx = $code + " - " +$item.title.Replace(":", "-").Replace("?", "").Replace("/", "-").Replace("&lt;", "").Replace("|", "").Replace('"',"").Replace("*","").Replace("’","'").Replace("â€","").Replace("'NEW SESSION'","").Replace("™","").Replace("œ","")
$filepptx =$filepptx.substring(0, [System.Math]::Min(120, $filepptx.Length))$filepptx = $filepptx.trim()$filejpg = $filepptx$filepptx = $filepptx + ".pptx"$filejpg = $filejpg + "_960.png"$folder = $item.title.Replace(":", "-").Replace("?", "").Replace("/", "-").Replace("&lt;", "").Replace("|", "").Replace('"',"").Replace("*","").Replace("’","'").Replace("â€","").Replace("'NEW SESSION'","").Replace("™","").Replace("œ","")$folder = $folder.substring(0, [System.Math]::Min(100,$folder.Length))
$folder =$folder.trim()

if (-not (Test-Path $folder)) { Write-Host "Folder$folder doesnt exist. Creating it..."
New-Item $folder -type directory } # Make sure the PowerPoint file doesn't already exist if (!(test-path "$path\$folder\$filepptx")) {
$filepptx$wc = (New-Object System.Net.WebClient)

Invoke-WebRequest $urlpptx -OutFile$path\$filepptx # download the jpg but don't want to break if this doesn't exist; hence the nested try blocks try { if($item.thumbnail -ne $null) {$urljpg = New-Object System.Uri($item.thumbnail.url) if (!(test-path "$path\$filejpg")) {$wc.DownloadFile($urljpg, "$path\$folder\$filejpg")
}
}
}
catch {
Write-Host "Image (jpeg) $filejpg doesn't exist ... eating the exception and moving on ..." } mv$filepptx $folder #mv$filejpg $folder } #endif else { Write-Host "PPTX:$filepptx exist; skipping download."
}

#try and get the sessions details
try {
$descriptionFileName = "$($path)\$($folder)\$($Code.trim()).txt" if (!(test-path "$descriptionFileName")) {
$OutFile = New-Item -type file$descriptionFileName -Force
$Content = "Title: " +$item.title.ToString().trim() + "rn" + "rn" + "Presenter: " + $item.creator + "rn" + "rn" + "Summary: " +$item.summary.ToString().trim() + "rn" + "rn" + "Link: " + $item.comments.ToString().trim() #some categories are missing; so need to eat the exception #this is a hack and not very elegant try { if($item.category -ne $null) {$Content = $Content + "rn" + "rn" + "Category: " +$item.category.ToString().trim().Replace("+"," ")
}
}
catch {
#do nothing; eat the exception
}

add-content $OutFile$Content
}
}
catch {
$ErrorMessage =$_.Exception.Message
$FailedItem =$_.Exception.ItemName
Write-host "\t" $ErrorMessage + "\n" +$FailedItem
}
} #end-loop foreach

}
catch
{
Write-host "Oops, could not find any slides."
$ErrorMessage =$_.Exception.Message
$FailedItem =$_.Exception.ItemName
Write-host "\t" $ErrorMessage + "\n" +$FailedItem
}

# if you don't want the video but only the slides just comment all the code below in the foreach loop
try {
if($sessionvideo) { switch ($videoquality) {
1 {$video = ($rss.downloadstring("http://s.ch9.ms/events/build/2018/rss/mp3")); break}
2 {$video = ($rss.downloadstring("http://s.ch9.ms/events/build/2018/rss/mp4")); break}
default {$video = ($rss.downloadstring("http://s.ch9.ms/events/build/2018/rss/mp4high")); break}
}

foreach($item in$video.rss.channel.item) {
# Grab the URL for the MP4 file
$url = New-Object System.Uri($item.enclosure.url)

$file =$item.title.Replace(":", "-").Replace("?", "").Replace("/", "-").Replace("&lt;", "").Replace("|", "").Replace('"',"").Replace("*","").Replace("’","'").Replace("â€","").Replace("'NEW SESSION'","").Replace("™","").Replace("œ","")
$file =$file.substring(0, [System.Math]::Min(120, $file.Length))$file = $file.trim()$file = $file + ".mp4"$folder = $item.title.Replace(":", "-").Replace("?", "").Replace("/", "-").Replace("&lt;", "").Replace("|", "").Replace('"',"").Replace("*","").Replace("’","'").Replace("â€","").Replace("'NEW SESSION'","").Replace("™","").Replace("œ","")$folder = $folder.substring(0, [System.Math]::Min(100,$folder.Length))
$folder =$folder.trim()

if (-not (Test-Path $folder)) { Write-Host "Folder$folder doesn't exist. Creating it..."
New-Item $folder -type directory } # Make sure the video file doesn't already exist if (!(test-path "$folder\$file")) { # Echo out the file that's being downloaded$file

try{
if (!(test-path "$path\$file"))
{
Invoke-WebRequest $url -OutFile$path\$file #move it from the current working folder to the target mv$file $folder } else { Write-Host "Video:$file - anoter process possibly working on this; skipping download."
}
}
catch {
$ErrorMessage =$_.Exception.Message
$FailedItem =$_.Exception.ItemName
Write-host "\t" $ErrorMessage + "\n" +$FailedItem
}
}
else {
Write-Host "Video: $file exist; skipping download." } } #end-loop foreach } #end - video check } catch { Write-host "Oops, could not find any videos or some other error happened."$ErrorMessage = $_.Exception.Message$FailedItem = $_.Exception.ItemName Write-host "\t"$ErrorMessage + "\n" + \$FailedItem

Write-host "*************** All Done! Woot! ***************"

## Certificate error with git and Donkey Car

If you were trying to pull the latest source code on your Raspberry Pi for donkeycar, and get the following error, then probably your clock is off (and I guess some nonce is failing). This can happen if your pi had been powered off for a while (as in my case), and it’s clock is off (clock drift is a real thing) :).

fatal: unable to access 'https://github.com/wroscoe/donkey/': server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none

To fix this, the following commands works. It seems the Raspberry Pi 3, by default has NTP disabled and this would enable it. I also had to check the result status with the second command, and force it with the third one.

sudo timedatectl set-ntp True
timedatectl status
sudo timedatectl set-local-rtc true


And that should do it; you might need to reboot the pi just to get it back on and then you should be able to pull the code off git and deploy your autonomous car.

## AI photos–style transfer

Can #AI make me look (more) presentable? The jury is out I think.

This is called style transfer, where the style/technique from a kind of painting (could be a photos too) is applied to an image, to create a new image. I took this using the built-in camera on my machine sitting at my desk and then applying the different kind of ‘styles’ on it. Each of these styles are is a separate #deeplearning model  that has learned how to apply the relevant style to a source image.

Style – Candy

Style – Feathers

Style – Mosaic

Style – Robert

Specifically, this uses a Neural Network (#DeepLearning) model called VGG19, which is a 19 layer model running on TensorFlow. Of course you can export this to a ONNX model, that then can be used in most other run-times and libraries.

This is inspired from Cornell universities paper – Perceptual Losses for Real-Time Style Transfer and Super-Resolution. Below is a snapshot of the VGG code that.

def net(data_path, input_image):
layers = (
'conv1_1', 'relu1_1', 'conv1_2', 'relu1_2', 'pool1',
'conv2_1', 'relu2_1', 'conv2_2', 'relu2_2', 'pool2',
'conv3_1', 'relu3_1', 'conv3_2', 'relu3_2', 'conv3_3',
'relu3_3', 'conv3_4', 'relu3_4', 'pool3',
'conv4_1', 'relu4_1', 'conv4_2', 'relu4_2', 'conv4_3',
'relu4_3', 'conv4_4', 'relu4_4', 'pool4',
'conv5_1', 'relu5_1', 'conv5_2', 'relu5_2', 'conv5_3',
'relu5_3', 'conv5_4', 'relu5_4'
)
mean = data['normalization'][0][0][0]
mean_pixel = np.mean(mean, axis=(0, 1))
weights = data['layers'][0]

net = {}
current = input_image
for i, name in enumerate(layers):
kind = name[:4]
if kind == 'conv':
kernels, bias = weights[i][0][0][0][0]
# matconvnet: weights are [width, height, in_channels, out_channels]
# tensorflow: weights are [height, width, in_channels, out_channels]
kernels = np.transpose(kernels, (1, 0, 2, 3))
bias = bias.reshape(-1)
current = _conv_layer(current, kernels, bias)
elif kind == 'relu':
current = tf.nn.relu(current)
elif kind == 'pool':
current = _pool_layer(current)
net[name] = current

assert len(net) == len(layers)
return net

def _conv_layer(input, weights, bias):
conv = tf.nn.conv2d(input, tf.constant(weights), strides=(1, 1, 1, 1),

def _pool_layer(input):
return tf.nn.max_pool(input, ksize=(1, 2, 2, 1), strides=(1, 2, 2, 1),


If you have interest to play with this, you can download the code. Personally, I like Mosaic style the best.

## DARPA’s perspective on AI

One of the challenges we have with AI is that there isn’t any universal definition – it is a broad category that means everything to everyone. Debating the rights, and, the wrongs, and the should’s and the shouldn’t s is another post though.

DARPA outlines this as the “programmed ability to process information” and across a certain set of criteria that span across perceiving, learning, abstracting, and, reasoning.

They classify AI in three waves – out outlined below. Each of these is at a different level across the intelligence scale. I believe it is important to have a scale such as this – it will help temper expectations and compare apples to apples; and for enterprises it will help create roadmaps on outcomes and their implementations; and finally help cut through the hype cycle noise that AI has generated.

##### Wave 1 – Handcrafted Knowledge

The first wave operates on a very narrow problem area (the domain) and essentially has no (self)learning capability. The key area to understand that the machine can explore specifics, based on the knowledge and related taxonomy/ structure which is defined by humans. We create a set of rules to represent the knowledge in a well-defined domain.

Of course as the Autonomous grand challenge taught us – it cannot handle uncertainty.

##### Wave 2 – Statistical Learning

The second wave, has better classification and prediction capabilities – a lot of which is via statistical learning. Essentially problems in certain domains are solved by statistical models – which are training on big data. It still doesn’t have contextual ability and has minimal reasoning ability.

A lot of what we are seeing today is related to this second wave; and one of the hypothesis holding this up is called manifold hypothesis. This essentially states that high dimension data (e.g. images, speech, etc.) tends to be in the vicinity of low dimension manifolds.

A manifold is an abstract mathematical space which, in a close-up view, resembles the spaces described by Euclidean geometry. Think of it as a set of points satisfying certain relationships, expressible in terms of distance and angle. Each manifold represents a different entity and the understanding of the data comes by separating the manifolds.

Using handwriting digits as an example – each image is one element in a set which has 784 dimensions, which form a number of different manifolds.

Separating each of these manifolds (by stretching and squishing of data) to get them isolated is what makes the layers in a Neural net work. Each layer in the neural network computes its output from the preceding layer of inputs (implemented usually by a non-linear function) – learning from the data.

So, in statistical learning, one would design and program the network structure based on experience. Here is an example of how the number 2 to be recognized goes through the various feature maps.

And one can combine and layer the various kinds of neural networks together (e.g. a CNN + RNN).

And whilst it is statistically impressive, it is also individually unreliable.

##### Wave 3 – Contextual Adaptation

The future on AI, is what DARPA is calling Contextual adaptation – where models explain their decisions, which is then used to drive further decisions. Essentially one ends up in this world where we construct contextual explanatory models that are reflective of real world situations.

In summary, we are in the midst of Wave 2 – which is already very exciting. For an enterprise, it is key to have a scale that outlines the ability to process information across the intelligence scale to help make this AI revolution more tangible and manageable.

PS – if you want to read up more on manifold hypothesis and how they play in neural networks, I would suggest reading Chris’s blog post.