The DHS reads ~2% of all your tweets

The U.S. Department of Homeland Security recently released under the FOIA a set of keywords they use for monitoring of social media. I have compiled this list into a single regex:

1
/(^|\s)(department of homeland security|dhs|federal emergency management agency|fema|coast guard|uscg|customs|border protection|cbp|border patrol|secret service|usss|national operations center|noc|homeland defense|immigration customs enforcement|ice|agent|task force|central intelligence agency|cia|fusion center|drug enforcement agency|dea|secure border initiative|sbi|federal bureau of investigation|fbi|alcohol|tobacco|firearms|atf|u.s. citizenship|immigration services|cis|federal air marshal service|fams|transportation security administration|tsa|air marshal|federal aviation administration|faa|national guard|red cross|united nations|un|assassination|attack|domestic security|drill|exercise|cops|law enforcement|authorities|disaster assistance|disaster management|dndo|domestic nuclear detection office|national preparedness|mitigation|prevention|response|recovery|dirty bomb|domestic nuclear detection|emergency management|emergency response|first responder|homeland security|maritime domain awareness|mda|national preparedness initiative|militia|shooting|shots fired|evacuation|deaths|hostage|explosion|explosive|police|disaster medical assistance team|dmat|organized crime|gangs|national security|state of emergency|security|breach|threat|standoff|swat|screening|lockdown|bomb|squad|threat|crash|looting|riot|emergency landing|pipe bomb|incident|facility|hazmat|nuclear|chemical spill|suspicious package|device|toxic|national laboratory|nuclear facility|nuclear threat|cloud|plume|radiation|radioactive|leak|biological infection|biological event|chemical|chemical burn|biological|epidemic|hazardous|hazardous material incident|industrial spill|infection|powder|gas|spillover|anthrax|blister agent|exposure|burn|nerve agent|ricin|sarin|north korea|outbreak|contamination|exposure|virus|evacuation|bacteria|recall|ebola|food poisoning|foot and mouth|fmd|h5n1|avian|flu|salmonella|small pox|plague|human to human|human to animal|influenza|center for disease control|cdc|drug administration|fda|public health|toxic|agro terror|tuberculosis|tb|agriculture|listeria|symptoms|mutation|resistant|antiviral|wave|pandemic|infection|water|air borne|sick|swine|pork|strain|quarantine|h1n1|vaccine|tamiflu|norvo virus|epidemic|world health organization|viral hemorrhagic fever|e. coli|infrastructure security|airport|cikr|critical infrastructure|key resources|amtrak|collapse|computer infrastructure|communications infrastructure|telecommunications|critical infrastructure|national infrastructure|metro|wmata|airplane|plane|aeroplane|chemical fire|subway|bart|marta|port authority|nbic|national biosurveillance integration center|transportation security|grid|power|smart|body scanner|electric|failure|outage|black out|brown out|port|dock|bridge|canceled|delays|service disruption|power lines|drug cartel|violence|gang|drug|narcotics|cocaine|marijuana|heroin|border|mexico|cartel|southwest|juarez|sinaloa|tijuana|torreon|yuma|tucson|decapitated|u.s. consulate|consular|el paso|fort hancock|san diego|ciudad juarez|nogales|sonora|colombia|mara salvatrucha|ms13|ms-13|drug war|mexican army|methamphetamine|cartel de golfo|gulf cartel|la familia|reynose|nuevo leon|narcos|narco banner|los zetas|shootout|execution|gunfight|trafficking|kidnap|calderon|reyosa|bust|tamaulipas|meth lab|drug trade|illegal immigrants|smuggling|smugglers|matamoros|michoacana|guzman|arellano-felix|beltran-leyva|barrio azteca|artistics assassins|mexicles|new federation|terrorism|al queda|al qaeda|al-qaeda|terror|attack|iraq|afghanistan|iran|pakistan|agro|environmental terrorist|eco terrorism|conventional weapon|target|weapons grade|dirty bomb|enriched|nuclear|chemical weapon|biological weapon|ammonium nitrate|improvised explosive device|ied|improvised explosive device|abu sayyaf|hamas|farc|armed revolutionary forces colombia|ira|irish republican army|eta|euskadi ta askatasuna|basque separatists|hezbollah|tamil tiger|plf|palestine liberation front|plo|palestine libration organization|car bomb|jihad|taliban|weapons cache|suicide bomber|suicide attack|suspicious substance|aqap|al qaeda arabian peninsula|aqim|al qaeda in the islamic maghreb|ttp|tehrik-i-taliban pakistan|yemen|pirates|extremism|somalia|nigeria|radicals|al-shabaab|home grown|plot|nationalist|recruitment|fundamentalism|islamist|emergency|hurricane|tornado|twister|tsunami|earthquake|tremor|flood|storm|crest|temblor|extreme weather|forest fire|brush fire|ice|stranded|stuck|help|hail|wildfire|tsunami warning center|magnitude|avalanche|typhoon|shelter-in-place|disaster|snow|blizzard|sleet|mud slide|mudslide|erosion|power outage|brown out|warning|watch|lightening|aid|relief|closure|interstate|burst|emergency broadcast system|cyber security|botnet|ddos|dedicated denial of service|denial of service|malware|virus|trojan|keylogger|cyber command|2600|spammer|phishing|rootkit|phreaking|cain and abel|brute forcing|mysql injection|cyber attack|cyber terror|hacker|china|conficker|worm|scammers|social media)(\s|$)/

From a semi-random sample of 5000 English tweets, the regex matches 114. Statistical caveats aside, this means that on average the DHS is reading 1 out of every 50 tweets you post.

There are some necessary assumptions made about use of the data here, notably case insensitivity and acknowledgement of word boundaries. I split terms like “Airplane (and derivatives)” into multiple keywords (“airplane”, “aeroplane” and “plane”) and separated out acronyms. Given the implication that the DHS uses TweetDeck for monitoring, I believe these assumptions are reasonable.

This is what happens when people ignore me!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
#!/usr/bin/env ruby

require 'pony'
require 'base64'

def pick_subject
  possibles = ["Even bytes get lonely for a little bit.", "All the lonely users, where do they all come from?", "Life is very long when you're lonely.", "Solitude is the profoundest fact of the human condition.", "Loneliness is a barrier that prevents one from uniting with the inner self.", "Absolute silence leads to sadness. It is the image of death.", "Good humor is the health of the soul, sadness is its poison.", "I don't need to manufacture trauma in my life to be creative.", "In deep sadness there is no place for sentimentality.", "It's the poignancy and sadness in things that gets to me.", "Sadness is also a kind of defence.", "Sadness is but a wall between two gardens.", "The sadness of the incomplete, the sadness that is often Life, but should never be Art.", "The walls we build around us to keep sadness out also keeps out the joy.", "You get used to sadness, growing up in the mountains, I guess.", "The surest cure for vanity is loneliness.", "There is no loneliness greater than the loneliness of a failure.", "When friendship disappears then there is a space left open to that awful loneliness of the outside world.", "What makes loneliness an anguish is not that I have no one to share my burden, but this: I have only my own burden to bear.", "The sky is one whole, the water another; and between those two infinities the soul of man is in loneliness.", "Loneliness is never more cruel than when it is felt in close propinquity with someone who has ceased to communicate.", "Loneliness is the ultimate poverty.", "Music was invented to confirm human loneliness."]

  possibles.sample
end

def pick_image_path
  Dir.glob('/home/mispy/sadkittens/*').sample
end

def sad_kitten(target)
  imgpath = pick_image_path
  imgfn = imgpath.split('/')[-1]
  body = %{
This is a multi-part message in MIME format.
--multipart_related_boundary
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

<!DOCTYPE html>
<html>
<head>
</head>
<body>
<img src="cid:sadkitten">
</body>
</html>

--multipart_related_boundary
Content-Type: image/jpeg; name="#{imgfn}"
Content-Transfer-Encoding: base64
Content-ID: <sadkitten>
Content-Disposition: inline; filename="#{imgfn}"

#{Base64.encode64(File.read(imgpath))}
  }

  Pony.mail(:to => target,
            :from => 'Sad Kitten Foundation <sadkitten@mispy.me>',
            :subject => pick_subject,
            :headers => { 'Content-Type' => 'multipart/related; boundary="multipart_related_boundary"' },
            :body => body)
end

if __FILE__ == $0
  sad_kitten(ARGV.first)
end
1
@daily bash -l -c "/home/mispy/scripts/sadkitten target@gmail.com"

Deviation Collector

I made a little Chrome extension to express my love for CoffeeScript and deviantART! The former is a marvellous language that makes JavaScript more palatable to a Rubyist, while the latter is simply an endless font of cute. My extension taps in to dA’s API to give you a periodically refreshing counter of new deviations produced by users you are watching, and will open these in tabs when clicked.

This project was slightly interesting due to the closed nature of the API, requiring a small amount of reverse-engineering. The Chrome developer tools remain an excellent choice for this— they even have a JS deobfuscator now! Fortunately, the DiFi request structure is fairly straightforward, and some nice hackers had already done much of the work for me. Code for the extension is on Github if you want to make a similar application of your own, but keep in mind that this API isn’t officially documented and deviantART management may not take too kindly to overuse!

Minecraft: Collage World

A silly project I undertook recently: collecting over 3000 individual Minecraft creation schematics and merging them all into a single, playable world. The results are, well… about as crazy as you would expect:

Here is a dynmap birds-eye view:

You can grab a copy of the world here if you want to mess around in it!

Methodology

The key to this project is the PhoenixTerrainMod customisable world generator, which provides a system for integrating player-designed objects into the generation process. I originally sourced 5488 schematic files from a crawl of the mcschematics.com forums. Converting these to the .bo2 format required by PTM was a somewhat arduous task— kudos to the lovely @unnali for rewriting my Ruby script in C and rendering it at least 10x faster! Of the original dataset, I ended up excluding every object with a duplicate filename or over 100kb in size (total of 2368 files) to fit time and memory constraints.

Downloads

5488 raw .schematic files (74.8MB): 5488schematics.tar.bz2
3120 processed .bo2 files (19.7MB): 3120bobs.tar.bz2
Collage world directory (50.5MB): collage.tar.bz2
.schematic –> .bo2 mass converter: sch2bob on Github

Quixotic symbolism of Lisps

I ported the Emacs psychotherapist from Emacs Lisp to Common Lisp just now, as a means of learning more about each language. This one particular discrepancy took an unreasonable amount of time for me to notice:

1
2
ELISP> (eq 'foo (intern "foo"))
t
1
2
CL-USER> (eq 'foo (intern "foo"))
NIL

The cause of this madness? Observe:

1
2
3
4
ELISP> (intern "foo")
foo
ELISP> (intern "FOO")
FOO
1
2
3
4
5
6
CL-USER> (intern "foo")
|foo|
:INTERNAL
CL-USER> (intern "FOO")
FOO
:INTERNAL

Symbols in Emacs Lisp behave much as strings with regard to case sensitivity, whereas in Common Lisp casing must be escaped with these curious vertical bars. Intern will perform the appropriate escaping, but raw symbol input is always read by CL as upcased. This may also trip you up in the inverse case of symbol->string conversion— make sure you use symbol-name or string instead of write-to-string!

The Political Compass

I tend to take this test once every year or so; it’s interesting to see how your opinion evolves over time. I think I may have drifted very slightly to the right, possibly because I’m just as disillusioned with governments as I am corporations nowadays. It also seems my views on privacy and free speech have been accentuated a little by personal experience and the political climate.

Displaying a ProgressDialog while a WebView is loading content

I believe this is the simplest solution:

1
2
3
4
5
6
7
8
9
10
final ProgressDialog pd = ProgressDialog.show(this, "", "Loading!", true);
setContentView(R.layout.webview);
mWebView = (WebView) findViewById(R.id.webview);
mWebView.setWebViewClient(new WebViewClient() {
  @Override
  public void onPageFinished(WebView view, String url) {
    pd.dismiss();
  }
});
mWebView.loadUrl("file:///android_asset/index.html");

Don’t be leaving your users with an ugly blank screen! :)

Culture shock

Consider the common Hash, a well-known inhabitant of the world of Ruby.

1
2
hashy = { :kvpairs => 'yo', :syntactic_sugar => 'coolness', :trope => "What Do You Mean, It's Not Awesome?" }
hashy.each { |k,v| puts "Iteration: \#{v}" }

Now witness the Java HashMap, a creature believed by some radical structologists to share common conceptual descent with our bracetacular friend.

1
2
3
4
5
6
7
8
9
HashMap<String, Object> ugh = new HashMap<String, Object>();
ugh.put("verbosity", "high");
ugh.put("type_signature", "disturbing");
ugh.put("complaints", "many");

for (String key: message.keySet()) {
  Object value = message.get(key);
  Log.i("please kill me", "Hyperbolisms: " + value);
}

Admittedly, it’s not that bad. At one point Java didn’t even have for-each style loops, and demanded explicit iterator construction. Still, my foray into Android app creation is certainly making apparent how dependent I have become on higher-level, dynamically-typed languages where simplicity and elegance of coding form are key. I need to be more flexible!

Empty sessions in Rails 3 when using jQuery AJAX requests

This problem has been addressed, but it took me a little while to find. What you need to do is explicitly tell jQuery to send the CSRF token along with every AJAX request:

1
2
3
4
$(document).ajaxSend(function(e, xhr, options) {
  var token = $("meta[name='csrf-token']").attr("content");
  xhr.setRequestHeader("X-CSRF-Token", token);
});

Just stick that somewhere in your layout head and your sessions should come back :)