Object and scene detection with #AI

Continuing the previous #ArtificialIntelligence theme. Wanted to see what and how does Amazon’s rekognition work and different from the #AI offerings from the others, such as Microsoft.

Here is a #ProjectMurphy image’s confidence score. I am glad to see that there is a 99% confidence that this is a person.

Object and Scene detection

The request POST is quite simple:

{
 "method": "POST",
 "path": "/",
 "region": "us-west-2",
 "headers": {
 "Content-Type": "application/x-amz-json-1.1",
 "X-Amz-Date": "Thu, 01 Dec 2016 22:21:01 GMT",
 "X-Amz-Target": "com.amazonaws.rekognitionservice.RekognitionService.DetectLabels"
 },
 "contentString": {
 "Attributes": [
 "ALL"
 ],
 "Image": {
 "Bytes": "..."
 }
 }
 }

And so is the response:

{
 "Labels": [
 {
 "Confidence": 99.2780990600586,
 "Name": "People"
 },
 {
 "Confidence": 99.2780990600586,
 "Name": "Person"
 },
 {
 "Confidence": 99.27307891845703,
 "Name": "Human"
 },
 {
 "Confidence": 73.7669448852539,
 "Name": "Flyer"
 },
 {
 "Confidence": 73.7669448852539,
 "Name": "Poster"
 },
 {
 "Confidence": 68.23612213134765,
 "Name": "Art"
 },
 {
 "Confidence": 58.291263580322266,
 "Name": "Brochure"
 },
 {
 "Confidence": 55.91957092285156,
 "Name": "Modern Art"
 },
 {
 "Confidence": 53.9996223449707,
 "Name": "Blossom"
 },
 {
 "Confidence": 53.9996223449707,
 "Name": "Flora"
 },
 {
 "Confidence": 53.9996223449707,
 "Name": "Flower"
 },
 {
 "Confidence": 53.9996223449707,
 "Name": "Petal"
 },
 {
 "Confidence": 53.9996223449707,
 "Name": "Plant"
 },
 {
 "Confidence": 50.69965744018555,
 "Name": "Face"
 },
 {
 "Confidence": 50.69965744018555,
 "Name": "Selfie"
 }
 ]
}

Here is what the facial analysis shows;

Facial Analysis

However how does it handle something a little more complex perhaps?

Object and Scene detection

And finally, what of the comparison? I think there might be some more work to be done on that front.

Face Comparison capture

Here is the response:

{
 "FaceMatches": [
 {
 "Face": {
 "BoundingBox": {
 "Height": 0.3878205120563507,
 "Left": 0.2371794879436493,
 "Top": 0.22435897588729858,
 "Width": 0.3878205120563507
 },
 "Confidence": 99.79533386230469
 },
 "Similarity": 0
 }
 ],
 "SourceImageFace": {
 "BoundingBox": {
 "Height": 0.209781214594841,
 "Left": 0.4188888967037201,
 "Top": 0.13127413392066955,
 "Width": 0.18111111223697662
 },
 "Confidence": 99.99442291259765
 }
}

Playing with #AI

So, been spending a lot of time recently around many things related to Artificial Intelligence (#AI).  More on that some day. 🙂

Was curious about yesterdays Amazon’s announcement to jump on this bandwagon. Of course Microsoft and others have been there. I don’t know to what extend has Amazon been working on this, but given Alexa has been out for a couple of years, I know they have had rich pickings of tuning this further.

I thought Polly (like the parrot?) was quite different from the things I have seen from others. This is a text-to-speech, where it renders the inputted text into various dialects and you can have a few outputs for those too. It supports a few dialects (for the synthesized speech) and one can use it using a simple API (the Android example shows it is not very complex to consume, of course you still need to think about the overall design and elements of Software Engineering, latency, limits, bandwidth, etc.). Should you desire you can customize it using pronunciation Lexicons that allow one to tweak this.

Here are a few examples, of course none of them are me, and hence the “cold”.

Australian (Male):

Indian (Female):

Italian (Male):

US/American (Male):

Of course if you play with it, it is easy to pick up the patterns and what is being changed, versus not. But kudos to the team on this. I think it will help accelerate the adoption of #AI.

Real-time performance capture – HoloPortation?

Some of the folks working on PPI and HoloPortation team from MSR left and went to setup a new company called PerceptiveIO.

They have recently published a paper called Fusion4D: Real0time performance capture of challenging scenes. In that they cover some of the work around multi-view performance capture, the raw depth acquisition and preprocessing that needs to be done around that. This interestingly also handles deformation changes (e.g. taking off a jacket or a scarf) and these can be non-rigid and much more difficult to handle, but they are done beautifully.

ffd 1.png

Combining this with the likes of HoloLens would make it quite interesting. If you want to see more, check out the video below showing the examples and transitions below. Perhaps one day, it would allow us to see and experience events from afar. 🙂

HoloLens–Device Portal (Part 2)

In addition to the HoloLens Device Portal (see part 1), another option is using the UAP HoloLens companion app which you can install from the store. I think this is a little more end-user friendly, and perhaps a little less developer focused. It exposes a subset of the same functionality.

Once you install it, you connect more or less in the same manner; I think most people will like the live streaming option. There is a bit of latency between the device and what is shown, but that could be somewhat because of our (possibly crappy) wireless which was overloaded with many folks at work.

imageStore Option

imageOnce you connect and set it up then you see the above screen. Of course you can manage multiple devices from here.

imageOnce you login, you see a lot of the same information as you saw in Part 1.

imageYou can see the Live stream as shown here; and what might not be obvious that it is both sound and video which is streamed. In this screenshot you can see my (work) login screen, with the password login being a Hologram. Here it is ‘floating’ over the window, and you can see a flavor of the mixed reality.

imageAs you can expect, you can capture either a photo or a video on what is being seen via the Device.

imageThe photos or videos that you do take, show up here. I suppose they are saved on the device and you would want to take it off there.

imageThe virtual keyboard again I think is one of the best features – saving so much time air-tapping and the arms. Smile

imageApp manager can do some elements of management, but not as much as the web version.

imageAnd finally, you can see some details on the device. I think the Shutdown and Reboot options are probably the one which are more useful.

All in all, this is a little more polished and end-user friendly. Useful when demo’ing the mixed reality solutions you are building.

HoloLens–Device Portal (Part 1)

One of the advantages of running Windows 10 on the HoloLens is that it has all the regular features that you would expect. From a developers perspective, one of those being the Device Portal which is awesome. It is essentially a web server that is being hosted on the machine, and allows you to manage your device over Wifi and USB.

It is a must have if you want to stream your apps (including Holograms) so that others can see it, or alternatively you can record and then share. And of course there are details for various debug situations and the Virtual input saves your fingers from getting tired! Smile You also use this to side load the apps you built. There are REST APIs you could use if you want to program, and there is also a UAP app on the store (more on that in part 2).

To get to this, you browse to the IP address. Below are a few screenshots from my playing around which shows you the various aspects of the portal and what all you can do. And the beauty of this is, as a Windows developer, this all should be very familiar and nothing new. Smile

imageHome Screen – once you login

image3D View Settings

imageMixed reality capture – one of the key elements that lets you share the magic with others

imagePerf Tracing and the various levels you can set as part of Windows Performance Toolkit. This is WPR/WPA support in Systems.Diagnostics.Tracing – see this post for more details.

imageProcess details and you can sort by the relevant column.

imageProcess details #1 – showing various details from Power to Framerate to IO, Memory, etc..

imageProcess details #2

imageProcess details #3

imageApp Manager which is where you side-load apps and manage them

imageCrash Data – the name says it all

imageKiosk Mode – this is really interesting; you can ‘lock’ into one app and use that. I wonder how one breaks out of it when done being in this mode and wanting to get back to ‘regular’.

image

imageAll the ETW (Event tracing for Windows) details and the providers you can want. Again pretty standard stuff.

imageSimulation – not sure if this is used for regression or playback in another setting – where the room capture would help. Does open up interesting possibilities. I think it might allow one to capture the spatial mapping of a room, which then you might be able to use in the emulator (such as someone has done here).

imageNetworking Configuration where you go and manage this.

imageVirtual Input – a great time saver.

imageAnd finally some of the security settings to ensure no one on the same subnet is mucking with you; or when there is more than one device then you talking to the right one.

Creative Coding

As we start to play and explore with new AR/VR mediums like Oculus and HoloLens there is a stronger shift from the traditional medium of working from a more transaction with-known-outcome based model to a more expressive and exploratory model. In the context of many enterprises this is a bigger shift – albeit some of it they have started seeing with mobility but still not the same.

I really like how Rick explains and expresses this both in terms of definition and thinking. The clay analogy I think really helps.

bash on Windows is real–not a VM

I have talked to a few folks recently, and they still don’t believe bash on Windows (RS1) is ‘real’ and think it some kind of a VM. No it is not. It is the ‘real’ user mode running on Windows. It is not Cygwin, and it is not a VM. It is essentially all of the user mode (I.e. Linux without the kernel).

The kernel in this case is a wrapper around the NT kernel that translates the Linux commands to Windows and then things run. As far as Linux is concerned, its the same code and doesn’t have any changes). Technically this is called Windows Subsystem for Linux (WSL).

On windows, this is installed in the user space; so each user get their own instance effectively which is isolated from the other users. Once you install it (and if you are still reading this, then you probably know how to install it), then this shows up under C:\Users\your-user-ID\AppData\Local\lxss. If you can’t find that folder, you can still type it and navigate to it. Below is  a screen shot on what this looks like:

image

It is a little interesting and been mucking around this. Here is you can see the installation of gcc:

image

And here is the output of the CPU details:

image

root@localhost:/proc# cat cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 78
model name      : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
stepping        : 3
microcode       : 0xffffffff
cpu MHz         : 2808.000
cache size      : 256 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 2
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 6
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm pni pclmulqdq est tm2 ssse3 fma cx16 xtpr sse4_1 sse4_2 movbe popcnt aes xsave osxsave avx f16c rdrand hypervisor
bogomips        : 5616.00
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 78
model name      : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
stepping        : 3
microcode       : 0xffffffff
cpu MHz         : 2808.000
cache size      : 256 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 2
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 6
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm pni pclmulqdq est tm2 ssse3 fma cx16 xtpr sse4_1 sse4_2 movbe popcnt aes xsave osxsave avx f16c rdrand hypervisor
bogomips        : 5616.00
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 2
vendor_id       : GenuineIntel
cpu family      : 6
model           : 78
model name      : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
stepping        : 3
microcode       : 0xffffffff
cpu MHz         : 2808.000
cache size      : 256 KB
physical id     : 0
siblings        : 4
core id         : 1
cpu cores       : 2
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 6
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm pni pclmulqdq est tm2 ssse3 fma cx16 xtpr sse4_1 sse4_2 movbe popcnt aes xsave osxsave avx f16c rdrand hypervisor
bogomips        : 5616.00
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 3
vendor_id       : GenuineIntel
cpu family      : 6
model           : 78
model name      : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
stepping        : 3
microcode       : 0xffffffff
cpu MHz         : 2808.000
cache size      : 256 KB
physical id     : 0
siblings        : 4
core id         : 1
cpu cores       : 2
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 6
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm pni pclmulqdq est tm2 ssse3 fma cx16 xtpr sse4_1 sse4_2 movbe popcnt aes xsave osxsave avx f16c rdrand hypervisor
bogomips        : 5616.00
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

root@localhost:/proc#

All, in all a very interesting world. A few things to note:

  • This is still in beta, so there will be issues.
  • It is user mode and not server mode. Live with it.
  • There would be path issues if you stray into the 256 character limit of Windows and then try and manipulate it in bash.

Happy hacking!

ÎÜñ|‹ø//ñ [ÐëÞrëçã†ëð]