Thursday, July 7, 2016

Using Python to access web data Week 6 Extracting data from JSON

The problem:
In this assignment you will write a Python program somewhat similar to http://www.pythonlearn.com/code/json2.py. The program will prompt for a URL, read the JSON data from that URL using urllib and then parse and extract the comment counts from the JSON data, compute the sum of the numbers in the file and enter the sum below:
We provide two files for this assignment. One is a sample file where we give you the sum for your testing and the other is the actual data you need to process for the assignment.
You do not need to save these files to your folder since your program will read the data directly from the URL. Note: Each student will have a distinct data url for the assignment - so only use your own data url for analysis.
Data Format
The data consists of a number of names and comment counts in JSON as follows:
{
  comments: [
    {
      name: "Matthias"
      count: 97
    },
    {
      name: "Geomer"
      count: 97
    }
    ...
  ]
}
The closest sample code that shows how to parse JSON and extract a list is json2.py. You might also want to look at geoxml.py to see how to prompt for a URL and retrieve data from a URL.
Sample Execution
$ python solution.py 
Enter location: http://python-data.dr-chuck.net/comments_42.json
Retrieving http://python-data.dr-chuck.net/comments_42.json
Retrieved 2733 characters
Count: 50
Sum: 2...
My submission:

 import json   
 import urllib  
 url= raw_input('Enter site: ')  
 print 'Retrieving ', url  
 data = urllib.urlopen(url).read()  
 info = json.loads(data)  
 tot = 0  
 print 'Retrieved ', len(data), 'characters'  
 print 'Count: ', len(info['comments'])  
 for i in range(0, len(info['comments'])):  
    tot += int(info['comments'][i]['count'])  
 print 'Sum ', tot  

6 comments:

  1. import json
    import urllib.request,urllib.parse,urllib.error
    url=input("Enter: ")
    data=urllib.request.urlopen(url).read().decode()
    item=json.loads(data)
    print("Count: ",len(item))
    sum=0
    for i in range(0,len(item["comments"])):
    sum=sum+int(item["comments"][i]["count"])
    print(sum)

    ReplyDelete

  2. import urllib.request, urllib.parse, urllib.error
    import json
    import ssl


    # Ignore SSL certificate errors
    ctx = ssl.create_default_context()
    ctx.check_hostname = False
    ctx.verify_mode = ssl.CERT_NONE

    while True:
    address = input('Enter url: ')
    if len(address) < 1: break

    dictionary = dict()
    dictionary['address'] = address



    print('Retrieving', address)
    uh = urllib.request.urlopen(address, context=ctx)
    data = uh.read().decode()
    print('Retrieved', len(data), 'characters')
    try:
    js=json.loads(data)
    except:
    js=None

    sum=0
    for item in data:
    x=int(item['comments'][0]['count'])
    sum=sum+x
    print(sum)


    whats wrong in this

    ReplyDelete
  3. i keep getting attribute error json has attribut loads

    ReplyDelete
  4. import json
    import urllib.request, urllib.error
    url= input('Enter site: ')
    print('Retrieving ', url)
    data = urllib.request.urlopen(url).read()
    info = json.loads(data)
    tot = 0
    print('Retrieved ', len(data), 'characters' )
    print('Count: ', len(info['comments']))
    for i in range(0, len(info['comments'])):
    tot += int(info['comments'][i]['count'])
    print('Sum ', tot)

    ReplyDelete