Building a beautiful design is a great experience. Seeing the design break apart when people start putting in real content, though, is painful. That’s why testing it as soon as possible with real information to see how it fares is so important. To this end, Web services provide us with a lot of information with which to fill our products. In recent years, this has been a specialist’s job, but the sheer amount of information available and the number of systems to consume it makes it easier and easier to use Web services, even for people with not much development experience.
On Programmable Web, you can find (to date) 2580 different application programming interfaces (or APIs). An API allows you to get access to an information provider’s data in a raw format and reformat it to suit your needs.
Each API is based on a different idea of what information you need to provide, what format it should be in, what data it will give back and in what format. All this makes using third-party APIs in your products very time-consuming, and the pain multiplies with each one you use. If you want to get photos from Flickr and updates from Twitter and then show the geographical information in Twitter on a map, then you have quite a trek ahead.
Because the APIs have been built over 10 years, they all vary in format and the way in which you access them. This cost Yahoo too much time, which is why it built Yahoo Pipes — to ease the process.
Large view
Pipes is amazing. It is a visual way to mix and match information from the Web. However, as people used Pipes more, they ran into limitations. Versioning pipes was hard; to change the functionality of the pipe just slightly, you had to go back to the system, and it tended to slow down with very complex and large conversions. This is why Yahoo offers a new system for people’s needs that change a lot or get very complex.
YQL is both a service and a language (Yahoo Query Language). It makes consuming Web services and APIs dead simple, both in terms of access and format.
Large view
As a very simple example, let’s get some photos from Flickr for the search term “cat”:
Large view
Instead of XML, choose JSON as the output format, and enter
Large view
You can then copy the URL created in the “REST query” field:
Large view
Write a JavaScript function called
Right now, all this does is display the titles of the retrieved photos as alerts, which is nothing but annoying. To display the photos in the right format, we need a bit more — but no magic either:
Put this into action and you’ll get photos of cats, live from Flickr and without having to go through any painful authentication process.
The complexity of the resulting HTML for display differs from data set to data set, but in essence the main trick remains the same: define a callback function, write it, copy and paste the URL you created in the console, test that data has been returned, and then go nuts.
Go to http://icant.co.uk/ to see my upcoming speaking engagements:
You can then use Firebug in Firefox to inspect this section of the page. Simply open Firebug, click the box with the arrow icon next to the bug, and move the cursor around the page until the blue border is around the element you want to inspect:
Large view
Right-click the selection, and select “Copy XPath” from the menu:
Large view
Go to the YQL console, and type in the following:
Large view
As you can see, this gets the HTML of the section that we want inside some XML. The easiest way to reuse this in HTML is by requesting a format that YQL calls JSON-P-X. This will return a simple JSON object with the HTML as a string. To use this, do the following:
Large view
Furthermore, you will see a diagnostics block in the data returned from YQL that tells you in detail what happened “under the hood.” If there are any problems accessing a certain service, it will show up there.
Large view
On Programmable Web, you can find (to date) 2580 different application programming interfaces (or APIs). An API allows you to get access to an information provider’s data in a raw format and reformat it to suit your needs.
The Trouble With APIs
The problem with APIs is that access to them varies in simplicity, from just having to load data from a URL all the way up to having to authenticate with the server and give all kinds of information about the application you want to build before getting your first chunk of information.Each API is based on a different idea of what information you need to provide, what format it should be in, what data it will give back and in what format. All this makes using third-party APIs in your products very time-consuming, and the pain multiplies with each one you use. If you want to get photos from Flickr and updates from Twitter and then show the geographical information in Twitter on a map, then you have quite a trek ahead.
Simplifying API Access
Yahoo uses APIs for nearly all of its products. Instead of accessing a database and displaying the information live on the screen, the front end calls an API, which in turn gets the information from the back end, which talks to databases. This gives Yahoo the benefit of being able to scale to millions of users and being able to change either the front or back end without disrupting the other.Because the APIs have been built over 10 years, they all vary in format and the way in which you access them. This cost Yahoo too much time, which is why it built Yahoo Pipes — to ease the process.
Large view
Pipes is amazing. It is a visual way to mix and match information from the Web. However, as people used Pipes more, they ran into limitations. Versioning pipes was hard; to change the functionality of the pipe just slightly, you had to go back to the system, and it tended to slow down with very complex and large conversions. This is why Yahoo offers a new system for people’s needs that change a lot or get very complex.
YQL is both a service and a language (Yahoo Query Language). It makes consuming Web services and APIs dead simple, both in terms of access and format.
Retrieving Data With YQL
The easiest way to access YQL is to use the YQL console. This tool allows you to preview your YQL work and play with the system without having to know any programming at all. The interface is made up of several components:Large view
- The YQL statement section is where you write your YQL query.
YQL has a very simple syntax, and we’ll get into its details a bit later on. Now is the time to try it out. Enter your query, define the output format (XML or JSON), check whether to have diagnostics reporting, and then hit the “Test” button to see the information. There is also a permalink; click it to make sure you don’t lose your work in case you accidentally hit the “Back” button. - The results section shows you the information returned from the Web service.
You can either read it in XML or JSON format or click the “Tree view” to navigate the data in an Explorer-like interface. - The REST query section gives you the URL of your YQL query.
You can copy and paste this URL at any time to use it in a browser or program. Getting information from different sources with YQL is actually this easy. - The queries section gives you access to queries that you previously entered.
You can define query aliases for yourself (much as you would bookmark websites), get a history of the latest queries (very useful in case you mess up) and get some sample queries to get started. - The data tables section lists all the Web services you can access using YQL.
Clicking the name of a table will in most cases open a demo query in the console. If you hover over the link, you’ll get two more links —desc
andsrc
— which give you information about the parameters that the Web service allows and which show the source of the data table itself. In most cases, all you need to do is click the name. You can also filter the data table list by typing what you’re looking for.
Using YQL Data
By far the easiest way to use YQL data is to select JSON as the output format and define a callback function. If you do that, you can then copy and paste the URL from the console and write a very simple JavaScript to display the information in HTML. Let’s give that a go.As a very simple example, let’s get some photos from Flickr for the search term “cat”:
select * from flickr.photos.search where text="cat"Type that into the YQL console, and hit the “Test” button. You will get the results in XML — a lot of information about the photos:
Large view
Instead of XML, choose JSON as the output format, and enter
myflickr
as the callback function name. You will get the same information as a JSON object inside a call to the function myflickr
.Large view
You can then copy the URL created in the “REST query” field:
Large view
Write a JavaScript function called
myflickr
with a parameter data
, and copy and paste the URL as the src
of another script block:If you run this inside a browser, the URL you copied will retrieve the data from the YQL server and send it to the
myflickr
function as the data
parameter. The data
parameter is an object that contains all the returned information from YQL. To make sure you have received the right information, test whether the data.query.results
property exists; then you can loop over the result set:You can easily get the structure of the information and know what is loop-able by checking the tree view of the results field in the console:
Right now, all this does is display the titles of the retrieved photos as alerts, which is nothing but annoying. To display the photos in the right format, we need a bit more — but no magic either:
Put this into action and you’ll get photos of cats, live from Flickr and without having to go through any painful authentication process.
The complexity of the resulting HTML for display differs from data set to data set, but in essence the main trick remains the same: define a callback function, write it, copy and paste the URL you created in the console, test that data has been returned, and then go nuts.
Using YQL To Reuse HTML Content
One other very powerful use of YQL is to access HTML content on the Web and filter it for reuse. This is usually called “scraping” and is a pretty painful process. YQL makes it easier because of two things: it cleans up the HTML retrieved from a website by running it through HTML Tidy, and it allows you to filter the result with XPATH. As an example, let’s retrieve the list of my upcoming conferences and display it.Go to http://icant.co.uk/ to see my upcoming speaking engagements:
You can then use Firebug in Firefox to inspect this section of the page. Simply open Firebug, click the box with the arrow icon next to the bug, and move the cursor around the page until the blue border is around the element you want to inspect:
Large view
Right-click the selection, and select “Copy XPath” from the menu:
Large view
Go to the YQL console, and type in the following:
select * from html where url="http://icant.co.uk" and xpath=''Copy the XPath from Firebug into the query, and hit the “Test” button.
select * from html where url="http://icant.co.uk" and xpath='//*[@id="travels"]'
Large view
As you can see, this gets the HTML of the section that we want inside some XML. The easiest way to reuse this in HTML is by requesting a format that YQL calls JSON-P-X. This will return a simple JSON object with the HTML as a string. To use this, do the following:
- Copy the URL from the REST field in the console.
- Add
&format=xml&callback=travels
to the end of the URL. - Add this as the
src
to a script block, and write this terribly simple JavaScript function:
The result is an unordered list of my events on your website:
Debugging YQL Queries
Things will go wrong, and having no idea why is terribly frustrating. The good news with YQL is that you will get error messages that are actually human-readable. If something fails in the console, you will see a big box under the query telling you what the problem was:Large view
Furthermore, you will see a diagnostics block in the data returned from YQL that tells you in detail what happened “under the hood.” If there are any problems accessing a certain service, it will show up there.
Large view
YQL Syntax
The basic syntax of YQL is very easy:select {what} from {source} where {conditions}You can filter your results, cut the information down only to the bits you want, paginate the results and nest queries in others. For all the details of the syntax and its nuances, check the extensive YQL documentation.
YQL Examples
You can do quite amazing things with YQL. By nesting statements in parentheses and filtering the results, you can reach far and wide across the Web of data. Simply click the following examples to see the results as XML documents. Copy and paste them into the console to play with them.- Search Google for the word “smashing”:
select * from google.search where q=”smashing” - Search Google for the word “smashing” but only return the titles and URLs:
select url,title from google.search where q=”smashing” - Grab the titles of the latest articles from the Smashing Magazine feed and translate them to French:
select * from google.translate where q in (select title from feed where url=”http://rss1.smashingmagazine.com/feed/”) and target=”fr” - Find the most recent mentions of the word “ipad” on Delicious and YouTube (using the Yahoo Firehose):
select * from social.updates.search where query=”ipad” and source in (“delicious”,”youtube”) - Show sushi restaurants in San Francisco. Get 50 results, but limit the amount to 20 and skip the first 10:
select * from local.search(50) where query=”sushi” and location=”san francisco, ca” limit 20 offset 10 - Show photos taken in Paris, France (by defining Paris as a Where on Earth (WoE) ID from the geoplanet data set and then searching Flickr for photos with that WoE ID):
select * from flickr.photos.search where woe_id in (select woeid from geo.places where text=”Paris,France”) - Get the Twitter names of all speakers at SWDC-central.com, and then search Twitter for tweets mentioning them:
select text,from_user from twitter.search where q in (select content from html where url=”http://swdc-central.com/” and xpath=”//a[contains(.,'@')]“)
YQL’s Limits
YQL has a few (sensible) limits:- You can access the URL 10,000 times an hour; after that you will be blocked. It doesn’t matter in our case because the blocking occurs per user, and since we are using JavaScript, this affects our end users individually and not our website. If you use YQL on the back end, you should cache the results and also authenticate to the service via oAuth to be allowed more requests.
- The language allows you to retrieve information; insert, update and delete from data sets; and limit the amount of data you get back. You can get paginated data (0 to 20, 20 to 40 and so on), and you can sort and find unique entries. What you can’t do in the YQL syntax is more complex queries, like “Get me all data sets in which the third character in the title attribute is x,” or something like that. You could, however, write a JavaScript that does this kind of transformation before YQL returns the data..
- You can access all open data on the Web, but if a website chooses to block YQL using the robots.txt directive, you won’t be allowed to access it. The same applies to data sources that require authentication or are hosted behind a firewall.
No comments:
Post a Comment