Can anomaly detection prove useful in client software? Can prediction? If you successfully trained an anomaly detector to catch issues that you, the software builder, agree are anomalies, then it could be hugely valuable in catching unknown issues. If you successfully trained a predictor to foresee an impending failure, you could use it as an opportunity to intervene, or to grab diagnostic information that will be unavailable once the error occurs.

This is a very cool vision. I'm just in the very early stages. This weekend, I hooked up NuPIC to an event stream, and observed what happened.

All of my source code resides here: https://github.com/mrcslws/BackseatDriver/tree/hackathon2014

The input data

The source

I chose to listen to ETW events on Windows. I chose Microsoft-IEFRAME, which corresponds to events from Internet Explorer UI.

Encoding it

I give the data to NuPIC as three columns: timestamp, thread id, event. I treat the thread id and event as strings, which NuPIC just treats as a "category".

20:35:34.908, Thread 2408, (Microsoft-IEFRAME Browseui_Tabs_WaitMessage 113 Start)
20:35:49.830, Thread 5336, (Microsoft-IEFRAME LegacyHistoryQuery 52 Start)
20:35:49.830, Thread 5336, (Microsoft-IEFRAME LegacyHistoryQuery 53 Stop)
20:36:00.899, Thread 3652, (Microsoft-IEFRAME Browseui_Tabs_WaitMessage 114 Stop)
20:36:00.912, Thread 3652, (Microsoft-IEFRAME Browseui_Tabs_WaitMessage 113 Start)
20:36:00.003, Thread 3652, (Microsoft-IEFRAME Browseui_Tabs_WaitMessage 114 Stop)

Configuring the model

The swarm

My input:

SWARM_CONFIG = {
    "includedFields": [
        {
            "fieldName": "timestamp",
            "fieldType": "datetime",
        },
        {
            "fieldName": "thread_id",
            "fieldType": "string",
        },
        {
            "fieldName": "event",
            "fieldType": "string",
        },
    ],
    "streamDef": {
        "info": "event",
        "version": 1,
        "streams": [
            {
                "info": "TwentyGithubTabs1967",
                "source": "file://data/TwentyGithubTabs1967.csv",
                "columns": [
                    "*"
                ]
            }
        ]
    },
    "inferenceType": "TemporalAnomaly",
    "inferenceArgs": {
        "predictionSteps": [
            1
        ],
        "predictedField": "event"
    },
    "swarmSize": "medium"
}

Its output:

It ignores most of my input -- it finds it can do better by just focusing on the event sequence. Some day I'll prove it wrong.

MODEL_PARAMS = {'aggregationInfo': {'days': 0,
                     'fields': [],
                     'hours': 0,
                     'microseconds': 0,
                     'milliseconds': 0,
                     'minutes': 0,
                     'months': 0,
                     'seconds': 0,
                     'weeks': 0,
                     'years': 0},
 'model': 'CLA',
 'modelParams': {'anomalyParams': {u'anomalyCacheRecords': None,
                                   u'autoDetectThreshold': None,
                                   u'autoDetectWaitRecords': None},
                 'clParams': {'alpha': 0.050050000000000004,
                              'clVerbosity': 0,
                              'regionName': 'CLAClassifierRegion',
                              'steps': '1'},
                 'inferenceType': 'TemporalAnomaly',
                 'sensorParams': {'encoders': {u'event': {'fieldname': 'event',
                                                          'n': 121,
                                                          'name': 'event',
                                                          'type': 'SDRCategoryEncoder',
                                                          'w': 21},
                                               u'thread_id': None,
                                               u'timestamp_dayOfWeek': None,
                                               u'timestamp_timeOfDay': None,
                                               u'timestamp_weekend': None},
                                  'sensorAutoReset': None,
                                  'verbosity': 0},
                 'spEnable': True,
                 'spParams': {'columnCount': 2048,
                              'globalInhibition': 1,
                              'inputWidth': 0,
                              'maxBoost': 2.0,
                              'numActiveColumnsPerInhArea': 40,
                              'potentialPct': 0.8,
                              'seed': 1956,
                              'spVerbosity': 0,
                              'spatialImp': 'cpp',
                              'synPermActiveInc': 0.05,
                              'synPermConnected': 0.1,
                              'synPermInactiveDec': 0.05015},
                 'tpEnable': True,
                 'tpParams': {'activationThreshold': 14,
                              'cellsPerColumn': 32,
                              'columnCount': 2048,
                              'globalDecay': 0.0,
                              'initialPerm': 0.21,
                              'inputWidth': 2048,
                              'maxAge': 0,
                              'maxSegmentsPerCell': 128,
                              'maxSynapsesPerSegment': 32,
                              'minThreshold': 11,
                              'newSynapseCount': 20,
                              'outputType': 'normal',
                              'pamLength': 3,
                              'permanenceDec': 0.1,
                              'permanenceInc': 0.1,
                              'seed': 1960,
                              'temporalImp': 'cpp',
                              'verbosity': 0},
                 'trainSPNetOnlyIfRequested': False},
 'predictAheadTime': None,
 'version': 1}

Making the output glance-able

I started with raw output that looked something like:

timestamp  thread_id   event   anomalyScore    prediction1
20:35:34.908   Thread 2408 (Microsoft-IEFRAME Browseui_Tabs_WaitMessage 113 Start) 0   (Microsoft-IEFRAME Browseui_Tabs_WaitMessage 114 Stop)
20:35:49.830    Thread 5336 (Microsoft-IEFRAME LegacyHistoryQuery 52 Start) 1   (Microsoft-IEFRAME Browseui_Tabs_WaitMessage 114 Stop)
20:35:49.830    Thread 5336 (Microsoft-IEFRAME LegacyHistoryQuery 53 Stop)  1   (Microsoft-IEFRAME Browseui_Tabs_WaitMessage 114 Stop)
20:36:00.899    Thread 3652 (Microsoft-IEFRAME Browseui_Tabs_WaitMessage 114 Stop)  1   (Microsoft-IEFRAME Browseui_Tabs_WaitMessage 114 Stop)
20:36:00.912    Thread 3652 (Microsoft-IEFRAME Browseui_Tabs_WaitMessage 113 Start) 0   (Microsoft-IEFRAME Browseui_Tabs_WaitMessage 113 Start)
20:36:00.003    Thread 3652 (Microsoft-IEFRAME Browseui_Tabs_WaitMessage 114 Stop)  0   (Microsoft-IEFRAME Browseui_Tabs_WaitMessage 113 Start)

I changed it to this:

I color-coded the most common events.

Noteworthy moments in the output

Foreword

I used the generic UI thread loop events as "training wheels" for validating my model.

This weekend I never took the training wheels off.

The beginning
Overlapping start/stop events from multiple tabs

Even after hundreds of thousands of events, this still causes a burst of anomalies.

This is why I tried adding Thread IDs, but swarming still determined that it was better off without them. But maybe it would change its mind with a different dataset.

The traumatized model

These unexplainable anomalies keep happening. Sometimes they even happen when the event was correctly predicted with 100% likelihood!

The case of the traumatized model

As seen above, I see unexplainable bursts of anomaly scores even when the prediction was 100% correct.

I had theories about why this happened. It involves the classifier. The next step is to observe more than is currently available in the OPF prediction_result.

Say hello to predictive_bits:

Let's explore two predictions.

The good:

The bad:

What were their bits?

The good:

[ 91 1633 3767 4638 4694 6107 10112 10113 10114 10115 10116 10117 10118 10119 10120 10121 10122 10123 10124 10125 10126 10127 10128 10129 10130 10131 10132 10133 10134 10135 10136 10137 10138 10139 10140 10141 10142 10143 12255 13593 14798 15988 18735 20896 20897 20898 20899 20900 20901 20902 20903 20904 20905 20906 20907 20908 20909 20910 20911 20912 20913 20914 20915 20916 20917 20918 20919 20920 20921 20922 20923 20924 20925 20926 20927 21216 21217 21218 21219 21220 21221 21222 21223 21224 21225 21226 21227 21228 21229 21230 21231 21232 21233 21234 21235 21236 21237 21238 21239 21240 21241 21242 21243 21244 21245 21246 21247 22683 23632 26332 26872 26944 26945 26946 26947 26948 26949 26950 26951 26952 26953 26954 26955 26956 26957 26958 26959 26960 26961 26962 26963 26964 26965 26966 26967 26968 26969 26970 26971 26972 26973 26974 26975 27362 29116 30374 34496 34497 34498 34499 34500 34501 34502 34503 34504 34505 34506 34507 34508 34509 34510 34511 34512 34513 34514 34515 34516 34517 34518 34519 34520 34521 34522 34523 34524 34525 34526 34527 39670 41878 44888 45797 51213 51262 53721 53824 53825 53826 53827 53828 53829 53830 53831 53832 53833 53834 53835 53836 53837 53838 53839 53840 53841 53842 53843 53844 53845 53846 53847 53848 53849 53850 53851 53852 53853 53854 53855 54197 55601 55778 56789 57243 62046 62143 63867 65225]

The bad:



Next steps


Closing

I've got more work to do. Building this kind of thing will take many deliberate small steps. I think this approach is worthwhile. And fun.