Can anomaly detection prove useful in client software? Can prediction? If you successfully trained an anomaly detector to catch issues that you, the software builder, agree are anomalies, then it could be hugely valuable in catching unknown issues. If you successfully trained a predictor to foresee an impending failure, you could use it as an opportunity to intervene, or to grab diagnostic information that will be unavailable once the error occurs.

This is a very cool vision. I'm just in the very early stages. This weekend, I hooked up NuPIC to an event stream, and observed what happened.

All of my source code resides here: https://github.com/mrcslws/BackseatDriver/tree/hackathon2014

The input data

The source

I chose to listen to ETW events on Windows. I chose Microsoft-IEFRAME, which corresponds to events from Internet Explorer UI.

Encoding it

I give the data to NuPIC as three columns: timestamp, thread id, event. I treat the thread id and event as strings, which NuPIC just treats as a "category".

20:35:34.908, Thread 2408, (Microsoft-IEFRAME Browseui_Tabs_WaitMessage 113 Start)
20:35:49.830, Thread 5336, (Microsoft-IEFRAME LegacyHistoryQuery 52 Start)
20:35:49.830, Thread 5336, (Microsoft-IEFRAME LegacyHistoryQuery 53 Stop)
20:36:00.899, Thread 3652, (Microsoft-IEFRAME Browseui_Tabs_WaitMessage 114 Stop)
20:36:00.912, Thread 3652, (Microsoft-IEFRAME Browseui_Tabs_WaitMessage 113 Start)
20:36:00.003, Thread 3652, (Microsoft-IEFRAME Browseui_Tabs_WaitMessage 114 Stop)

Configuring the model

The swarm

My input:

SWARM_CONFIG = {
    "includedFields": [
        {
            "fieldName": "timestamp",
            "fieldType": "datetime",
        },
        {
            "fieldName": "thread_id",
            "fieldType": "string",
        },
        {
            "fieldName": "event",
            "fieldType": "string",
        },
    ],
    "streamDef": {
        "info": "event",
        "version": 1,
        "streams": [
            {
                "info": "TwentyGithubTabs1967",
                "source": "file://data/TwentyGithubTabs1967.csv",
                "columns": [
                    "*"
                ]
            }
        ]
    },
    "inferenceType": "TemporalAnomaly",
    "inferenceArgs": {
        "predictionSteps": [
            1
        ],
        "predictedField": "event"
    },
    "swarmSize": "medium"
}

Its output:

It ignores most of my input -- it finds it can do better by just focusing on the event sequence. Some day I'll prove it wrong.

MODEL_PARAMS = {'aggregationInfo': {'days': 0,
                     'fields': [],
                     'hours': 0,
                     'microseconds': 0,
                     'milliseconds': 0,
                     'minutes': 0,
                     'months': 0,
                     'seconds': 0,
                     'weeks': 0,
                     'years': 0},
 'model': 'CLA',
 'modelParams': {'anomalyParams': {u'anomalyCacheRecords': None,
                                   u'autoDetectThreshold': None,
                                   u'autoDetectWaitRecords': None},
                 'clParams': {'alpha': 0.050050000000000004,
                              'clVerbosity': 0,
                              'regionName': 'CLAClassifierRegion',
                              'steps': '1'},
                 'inferenceType': 'TemporalAnomaly',
                 'sensorParams': {'encoders': {u'event': {'fieldname': 'event',
                                                          'n': 121,
                                                          'name': 'event',
                                                          'type': 'SDRCategoryEncoder',
                                                          'w': 21},
                                               u'thread_id': None,
                                               u'timestamp_dayOfWeek': None,
                                               u'timestamp_timeOfDay': None,
                                               u'timestamp_weekend': None},
                                  'sensorAutoReset': None,
                                  'verbosity': 0},
                 'spEnable': True,
                 'spParams': {'columnCount': 2048,
                              'globalInhibition': 1,
                              'inputWidth': 0,
                              'maxBoost': 2.0,
                              'numActiveColumnsPerInhArea': 40,
                              'potentialPct': 0.8,
                              'seed': 1956,
                              'spVerbosity': 0,
                              'spatialImp': 'cpp',
                              'synPermActiveInc': 0.05,
                              'synPermConnected': 0.1,
                              'synPermInactiveDec': 0.05015},
                 'tpEnable': True,
                 'tpParams': {'activationThreshold': 14,
                              'cellsPerColumn': 32,
                              'columnCount': 2048,
                              'globalDecay': 0.0,
                              'initialPerm': 0.21,
                              'inputWidth': 2048,
                              'maxAge': 0,
                              'maxSegmentsPerCell': 128,
                              'maxSynapsesPerSegment': 32,
                              'minThreshold': 11,
                              'newSynapseCount': 20,
                              'outputType': 'normal',
                              'pamLength': 3,
                              'permanenceDec': 0.1,
                              'permanenceInc': 0.1,
                              'seed': 1960,
                              'temporalImp': 'cpp',
                              'verbosity': 0},
                 'trainSPNetOnlyIfRequested': False},
 'predictAheadTime': None,
 'version': 1}

Making the output glance-able

I started with raw output that looked something like:

timestamp  thread_id   event   anomalyScore    prediction1
20:35:34.908   Thread 2408 (Microsoft-IEFRAME Browseui_Tabs_WaitMessage 113 Start) 0   (Microsoft-IEFRAME Browseui_Tabs_WaitMessage 114 Stop)
20:35:49.830    Thread 5336 (Microsoft-IEFRAME LegacyHistoryQuery 52 Start) 1   (Microsoft-IEFRAME Browseui_Tabs_WaitMessage 114 Stop)
20:35:49.830    Thread 5336 (Microsoft-IEFRAME LegacyHistoryQuery 53 Stop)  1   (Microsoft-IEFRAME Browseui_Tabs_WaitMessage 114 Stop)
20:36:00.899    Thread 3652 (Microsoft-IEFRAME Browseui_Tabs_WaitMessage 114 Stop)  1   (Microsoft-IEFRAME Browseui_Tabs_WaitMessage 114 Stop)
20:36:00.912    Thread 3652 (Microsoft-IEFRAME Browseui_Tabs_WaitMessage 113 Start) 0   (Microsoft-IEFRAME Browseui_Tabs_WaitMessage 113 Start)
20:36:00.003    Thread 3652 (Microsoft-IEFRAME Browseui_Tabs_WaitMessage 114 Stop)  0   (Microsoft-IEFRAME Browseui_Tabs_WaitMessage 113 Start)

I changed it to this:

I color-coded the most common events.

Noteworthy moments in the output

Foreword

I used the generic UI thread loop events as "training wheels" for validating my model.

This weekend I never took the training wheels off.

The beginning
Overlapping start/stop events from multiple tabs

Even after hundreds of thousands of events, this still causes a burst of anomalies.

This is why I tried adding Thread IDs, but swarming still determined that it was better off without them. But maybe it would change its mind with a different dataset.

The traumatized model

These unexplainable anomalies keep happening. Sometimes they even happen when the event was correctly predicted with 100% likelihood!

The case of the traumatized model

As seen above, I see unexplainable bursts of anomaly scores even when the prediction was 100% correct.

I had theories about why this happened. It involves the classifier. The next step is to observe more than is currently available in the OPF prediction_result.

Say hello to predictive_bits:

Let's explore two predictions.

The good:

The bad:

What were their bits?

The good:

[ 91 1633 3767 4638 4694 6107 10112 10113 10114 10115 10116 10117 10118 10119 10120 10121 10122 10123 10124 10125 10126 10127 10128 10129 10130 10131 10132 10133 10134 10135 10136 10137 10138 10139 10140 10141 10142 10143 12255 13593 14798 15988 18735 20896 20897 20898 20899 20900 20901 20902 20903 20904 20905 20906 20907 20908 20909 20910 20911 20912 20913 20914 20915 20916 20917 20918 20919 20920 20921 20922 20923 20924 20925 20926 20927 21216 21217 21218 21219 21220 21221 21222 21223 21224 21225 21226 21227 21228 21229 21230 21231 21232 21233 21234 21235 21236 21237 21238 21239 21240 21241 21242 21243 21244 21245 21246 21247 22683 23632 26332 26872 26944 26945 26946 26947 26948 26949 26950 26951 26952 26953 26954 26955 26956 26957 26958 26959 26960 26961 26962 26963 26964 26965 26966 26967 26968 26969 26970 26971 26972 26973 26974 26975 27362 29116 30374 34496 34497 34498 34499 34500 34501 34502 34503 34504 34505 34506 34507 34508 34509 34510 34511 34512 34513 34514 34515 34516 34517 34518 34519 34520 34521 34522 34523 34524 34525 34526 34527 39670 41878 44888 45797 51213 51262 53721 53824 53825 53826 53827 53828 53829 53830 53831 53832 53833 53834 53835 53836 53837 53838 53839 53840 53841 53842 53843 53844 53845 53846 53847 53848 53849 53850 53851 53852 53853 53854 53855 54197 55601 55778 56789 57243 62046 62143 63867 65225]

The bad:

[ 80 83 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 4608 4609 4610 4611 4612 4613 4614 4615 4616 4617 4618 4619 4620 4621 4622 4623 4624 4625 4626 4627 4628 4629 4630 4631 4632 4633 4634 4635 4636 4637 4638 4639 4681 4685 4694 4696 4702 6080 6081 6082 6083 6084 6085 6086 6087 6088 6089 6090 6091 6092 6093 6094 6095 6096 6097 6098 6099 6100 6101 6102 6103 6104 6105 6106 6107 6108 6109 6110 6111 14785 14810 15968 15969 15970 15971 15972 15973 15974 15975 15976 15977 15978 15979 15980 15981 15982 15983 15984 15985 15986 15987 15988 15989 15990 15991 15992 15993 15994 15995 15996 15997 15998 15999 18726 18912 18913 18914 18915 18916 18917 18918 18919 18920 18921 18922 18923 18924 18925 18926 18927 18928 18929 18930 18931 18932 18933 18934 18935 18936 18937 18938 18939 18940 18941 18942 18943 19616 19617 19618 19619 19620 19621 19622 19623 19624 19625 19626 19627 19628 19629 19630 19631 19632 19633 19634 19635 19636 19637 19638 19639 19640 19641 19642 19643 19644 19645 19646 19647 20896 20897 20898 20899 20900 20901 20902 20903 20904 20905 20906 20907 20908 20909 20910 20911 20912 20913 20914 20915 20916 20917 20918 20919 20920 20921 20922 20923 20924 20925 20926 20927 22657 22666 22682 23616 23617 23618 23619 23620 23621 23622 23623 23624 23625 23626 23627 23628 23629 23630 23631 23632 23633 23634 23635 23636 23637 23638 23639 23640 23641 23642 23643 23644 23645 23646 23647 24544 24545 24546 24547 24548 24549 24550 24551 24552 24553 24554 24555 24556 24557 24558 24559 24560 24561 24562 24563 24564 24565 24566 24567 24568 24569 24570 24571 24572 24573 24574 24575 25728 25729 25730 25731 25732 25733 25734 25735 25736 25737 25738 25739 25740 25741 25742 25743 25744 25745 25746 25747 25748 25749 25750 25751 25752 25753 25754 25755 25756 25757 25758 25759 26865 26872 27363 27366 27369 27388 29090 29102 29110 29116 29119 30386 30393 33984 33985 33986 33987 33988 33989 33990 33991 33992 33993 33994 33995 33996 33997 33998 33999 34000 34001 34002 34003 34004 34005 34006 34007 34008 34009 34010 34011 34012 34013 34014 34015 36672 36673 36674 36675 36676 36677 36678 36679 36680 36681 36682 36683 36684 36685 36686 36687 36688 36689 36690 36691 36692 36693 36694 36695 36696 36697 36698 36699 36700 36701 36702 36703 39650 39657 39665 39669 41864 41865 41868 41871 41885 43136 43137 43138 43139 43140 43141 43142 43143 43144 43145 43146 43147 43148 43149 43150 43151 43152 43153 43154 43155 43156 43157 43158 43159 43160 43161 43162 43163 43164 43165 43166 43167 44881 44888 44893 45792 45793 45794 45795 45796 45797 45798 45799 45800 45801 45802 45803 45804 45805 45806 45807 45808 45809 45810 45811 45812 45813 45814 45815 45816 45817 45818 45819 45820 45821 45822 45823 46336 46337 46338 46339 46340 46341 46342 46343 46344 46345 46346 46347 46348 46349 46350 46351 46352 46353 46354 46355 46356 46357 46358 46359 46360 46361 46362 46363 46364 46365 46366 46367 49664 49665 49666 49667 49668 49669 49670 49671 49672 49673 49674 49675 49676 49677 49678 49679 49680 49681 49682 49683 49684 49685 49686 49687 49688 49689 49690 49691 49692 49693 49694 49695 51245 51249 51258 51262 53696 53697 53698 53699 53700 53701 53702 53703 53704 53705 53706 53707 53708 53709 53710 53711 53712 53713 53714 53715 53716 53717 53718 53719 53720 53721 53722 53723 53724 53725 53726 53727 54176 54177 54178 54179 54180 54181 54182 54183 54184 54185 54186 54187 54188 54189 54190 54191 54192 54193 54194 54195 54196 54197 54198 54199 54200 54201 54202 54203 54204 54205 54206 54207 54496 54497 54498 54499 54500 54501 54502 54503 54504 54505 54506 54507 54508 54509 54510 54511 54512 54513 54514 54515 54516 54517 54518 54519 54520 54521 54522 54523 54524 54525 54526 54527 55584 55585 55586 55587 55588 55589 55590 55591 55592 55593 55594 55595 55596 55597 55598 55599 55600 55601 55602 55603 55604 55605 55606 55607 55608 55609 55610 55611 55612 55613 55614 55615 56769 56770 56786 56792 56796 56798 57216 57217 57218 57219 57220 57221 57222 57223 57224 57225 57226 57227 57228 57229 57230 57231 57232 57233 57234 57235 57236 57237 57238 57239 57240 57241 57242 57243 57244 57245 57246 57247 58048 58049 58050 58051 58052 58053 58054 58055 58056 58057 58058 58059 58060 58061 58062 58063 58064 58065 58066 58067 58068 58069 58070 58071 58072 58073 58074 58075 58076 58077 58078 58079 60512 60513 60514 60515 60516 60517 60518 60519 60520 60521 60522 60523 60524 60525 60526 60527 60528 60529 60530 60531 60532 60533 60534 60535 60536 60537 60538 60539 60540 60541 60542 60543 62023 62042 62046 63842 63865 63867 65216 65217 65218 65219 65220 65221 65222 65223 65224 65225 65226 65227 65228 65229 65230 65231 65232 65233 65234 65235 65236 65237 65238 65239 65240 65241 65242 65243 65244 65245 65246 65247]

Next steps


Closing

I've got more work to do. Building this kind of thing will take many deliberate small steps. I think this approach is worthwhile. And fun.