A.The NewsCred API is entirely RESTful, which means that it is solely pull-based. We only send you data when you initiate a request. That said, we have made it very simple for clients to make periodic requests for full updates.
The way to do this is to use two API parameters. The first is the modified_since parameter, which should contain the date and time of the last time you made a request. The second is the sort=modified_at parameter, which will cause articles to be ordered from oldest to newest, starting from the time you specify in the modified_since parameter.
A. You can use the pagesize and offset parameters to control how many articles come back in one response. The API will support a pagesize up to 999 articles, but in practice we recommend setting the pagesize to 100 articles or less to keep the response time to a minimum. The default pagesize is 10.
You can make subsequent calls to get more articles, by increasing the offset parameter. For example if your pagesize was 20 and you wanted the first 20 articles that match your search, you would send offset=0 (this is the default behavior). If you wanted the next 20 articles, you would send offset=20. For the 20 after that you would send offset=40, and so on.
A. You can check the num_found parameter in the top node of the API response, <article_set>.
A. Absolutely! Our API is 100% open, and we keep the documentation on our website's Developers' section, available at http://developer.newscred.com/developer/docs
A. We work hard to make our responses as fast as we can, but it’s possible to make your API calls even faster by requesting only a subset of the data we return by default. The following adjustments can have a particularly big impact on response times:
A.There are three ways to get images along with articles. Each has their own pros and cons :
This is the simplest option. Articles with images are automatically included in RSS-formatted API responses. We follow the industry-standard MRSS format, so if your RSS parser supports MRSS, this is a great option.
If you'd prefer to use XML or JSON, you can get the images through a second API call. When you retrieve a set of articles, one of the fields you'll have in the response is the article GUID. Once you have that, you can retrieve the images for each article by calling the article/GUID/images call, detailed here.
As an alternative to a second API call, you can use the fields parameter when you make your articles call, and make sure to request the article.image.* nodeset. Please note that if you use the fields parameter, you have to specify all the fields you want in the API response, not just the additional fields. That is to say, if you simply append fields=article.image.* to your API call, the only fields you'll get back will be the image data!
You can find more information about the fields parameter in the fields section of the FAQ.
A.There are two ways for publishers to send images with an article. The first way is for an image to be included with an article, without any placement information. Think of it as two separate items: a block of text which is the article, and a block of data which is the image. The two are associated, but not directly linked. These are the images that our has_images parameter is referring to. It's a flag to let you know that if you want them, you can get associated images through the article/GUID/images call (or the fields parameter, as explained in this question).
What you're seeing are "inline" images. These are images that are included as part of the text. These tend to be contextual, and their placement within the text is important (eg, in the case of an illustrated step-by-step how-to). Because these are part of the text (they're just HTML image tags) they should render automatically, and there's no particular need to flag them.
A.Not yet, but we're working on it.
A.Not yet, but we're working on it.
A.Not yet, but we're working on it.
A.Yes! If you subscribe to a standalone images package (that is, if you're licensing content from an images provider, separate from any articles), you can use our "smart images" feature to request images that match articles, even if the article's original publisher didn't include any.
You can accomplish this by sending the 'smart_images=true' parameter with your articles API call, and the subsequent article/GUID/images API call (if you're using the fields parameter or RSS-formatted responses, you won't need to make the subsequent API call).
A.Sometimes our API responses will contain more data than you need. This usually isn't a major problem - you can just ignore the fields you aren't interested in - but sometimes in the interest of speed or brevity you may want to not even have the extraneous data returned at all.
Conversely, there are certain data fields we don't return by default (such as article images), in the interest of keeping our response time to a minimum. In order to fine-tune which fields are returned in the API response, you can send the 'fields' parameter in the API request.
The value of your fields parameter should include a list of all the nodes you want us to return on the object type you're requesting. We use period-separated node trees, so for example every node of an article object will begin with "article.", while every node of an image object will begin with "image.".
You can traverse as far down the tree as you'd like. For example if you wanted the name of the article source but nothing else about the source you could request the article.source.name node. Please note that you do not need to specify the 'set' nodes. The correct format for the names of topics is article.topic.name, not article.topic_set.topic.name.
It's important to realize that this will override the default fields that we return, not add or subtract to it. So if you do choose to use the fields parameter, you'll need to list all the fields you want us to return. This list can get long, and that's okay! A common fields query might look like this :
fields=article.title article.description article.guid article.published_at article.source.name article.topic.name article.topic.image_url article.topic.guid article.topic.topic_classification article.image.guid article.image.caption article.image.height article.image.width article.image.source.name article.image.urls.large article.author.name article.author.first_name article.author.last_name article.category.name article.link
A more minimal set of fields, which would make the API call run much faster (albeit with far less data), might look like this: fields=article.guid article.title article.description article.published_at article.source_name article.author.name
A.What you're seeing is probably the descriptions of topics. Topics are used to describe what individual entities are discussed within an article. Each topic itself has some metadata associated with it, including a description. While this appears in a description node, it is a distinct node from the article description. The XML path for the article description will be /article_set/article/description, while the XML path for the topic descriptions will be /article_set/article/topic_set/topic/description (JSON paths follow a similar hierarchy).
If you would prefer to skip over the topic information altogether, you can send the API parameter get_topics=false in your articles search, and the entire topic_set node will be removed from the results. Alternatively if you'd like to only receive some of the topic metadata (for example if you want the topic name and GUID, but don't care for the description or classifications) you can use the fields parameter to control which topic fields are included in the response.
You can find more information about the fields parameter in the fields section of the FAQ.
A.The tracking pixel is a 1x1 transparent image (it's basically invisible) that we require our clients to display with the articles wherever they're posted. It allows us to keep track of who posts what content, and powers the analytics screens in our control panel, for both clients and publishers.
You'll generally see the tracking pixel in two places in the API response, although you only need to display it
once.
It will be inside its own XML/JSON node named tracking_pixel, but it's also included as an HTML tag in the
article content (the description node) at the end. The
tag will always have its class set to "nc_pixel" as
well, to make it easier to find should you need to do any special processing on it.
As long as you aren't parsing images out of the article text, you shouldn't need to do any special processing in order to support the tracking pixel.
A.(Note: This section is specifically about the score attribute of the topic object, not to be confused with an article's score attribute, discussed in the next question.)
When topics are assigned to an article, they are ranked against each other and scored on their relevancy within the article. The most relevant topic for an article will always be assigned a score of 1.0, the second-most-relevant will have a fractionally lower score which is the second highest among that set of topics, etc.
It's important to understand that these scores are only meaningful within the context of the single article, however. If you have two articles that both have the topic Abraham Lincoln, one with the score 0.75 and one with the score 0.70, it shouldn't be taken as an indication as to which article is "more about" Abraham Lincoln. It's just a comparison against the other topics for those two respective articles.
A.(Note: This section is specifically about the score attribute of the article object, not to be confused with a topic's score attribute, which is discussed in the previous question.)
Article scores are an approximate calculation of how relevant an article is to the search parameters that were sent in the query. It's a combination of the frequency of the search terms, how new the article is, the size of the publication, and a number of other factors. The score for an article will change accordingly, every time you change your search parameters.
In practice, it tends not to be used very much. If you're more concerned about relevancy than recency, you're better off sending the sort=relevance parameter and taking the top results that come back from that.
A.Metadata varies from source to source, and is mostly used for internal troubleshooting within NewsCred. You're welcome to read through it, but there tends not to be much there. If you'd rather not receive this data at all, you can send the get_metadata=false parameter.
A.Each of our article nodes contains a link node. It's an unfortunate naming conflict, as this really isn't meant to be a web link at all. It's a unique string we get from our publishers, used to identify articles in their own system. Sometimes publishers use a URL as the identifier, but not always. Even when they do, the URL isn't always publicly accessible (eg, it may only be available from the publisher's internal Intranet). The primary purpose of this field is for us to troubleshoot issues with our publishers. We include it by default because we believe it's better to have more debugging info rather than less, but it's not a field which serves much purpose outside of that.
A.We integrate with Drupal 6.x and 7.x by providing RSS feeds. The basic "out of the box" Drupal installation is quite bare-bones, however, and requires a number of modules in order to import data. Please see our separate Drupal integration guide for a full walkthrough.
A.The NewsCred Wordpress plugin seamlessly integrates NewsCred content with your Wordpress site. The plugin offers two main modes of operation:
Once you’ve installed the plugin, you can search for content from NewsCred, from directly within your site’s Add New Post page. You can filter by source, topic, category, or text, just like you can through the API or our Content Explorer. Once you find an article (or image) you like, you can add it directly to your post.
The plugin can also store API calls to request on a regular basis, and add new posts automatically. These posts can be published on your site as soon as they’re imported, or saved into a drafts folder for you to review later.
You can use either or both modes on your Wordpress site. The plugin can also be configured to save metadata like topics and categories, auto-assign images, scale images, and more.
To download and install the plugin, please download it from the Wordpress Plugin Directory, or search for and install it directly from within your Wordpress admin panel.
A.The best way to integrate with Joomla is to write a custom module to import NewsCred articles (in either XML or JSON) and create articles directly through the module. Because Joomla installations tend to be so heavily customized, we don't offer a standard NewsCred module for this.
As a stopgap, you can use an off-the-shelf RSS module such as FeedGator to import NewsCred articles in RSS format. These modules offer much less flexibility and frequently have bugs surrounding images and formatting in general, but they do offer a faster turnaround time than custom coding.
A.When we return articles, by default we include a list of topics within each <article> node. Inside each of these topics is a GUID, which you can use to retrieve more content relative to that topic. For example, by calling the topic/GUID/articles API endpoint, you can get articles related to this topic. By calling topic/GUID/images, you can retrieve images related to this topic.
A.These code examples are meant to be just that - examples. They don't use a valid access key and may occasionally throw exceptions. If you do copy and paste these directly into your code, you are doing so at your own risk!
All these samples are written in Python, but the concepts should carry across to other languages without much trouble. We tried to avoid Python-specific structures wherever we were able to. We also take some pseudocode liberties with parsing the XML, since every programming language has their own library (or libraries) for doing this.
A.We're going to assume that the current time is 2012-01-01 12:00:00 UTC (all our timestamps and time fields are always UTC), and the last time we made a request was earlier in the day at 11:30:00 UTC. We're also going to assume that the number of new articles is more than our preferred pagesize (10 articles), so we'll have to make several page requests to get all the articles. It's also important that you update your num_found on every iteration through the loop -- if we ingest new articles that match your search while you're retrieving content, the num_found will increase from one call to the next. You'll want to take the new value, to make sure you get all the new content!
previous_request_time = "2012-01-01%2011:30:00" # note that we URL-escape the space to a %20 character
pagesize=10 # how many articles we request back in each call
offset=0 # as we iterate through, this will increase by the number of articles we receive
request_url_base = "http://api.newscred.com/articles?access_key=xxxx&pagesize=10&fulltext=true&sort=modified&modified_since=" + previous_request_time
while True:
request_url = request_url_base + "&offset=" + offset
XMLTree = parse_xml(retrieve_url(request_url))
num_found = XMLTree.get_node("/article_set/@num_found") # this is the total number of articles which match the query
for article in XMLTree.get_node("/article_set/article"):
article_text = article.get_node("description/text()")
article_title = article.get_node("title/text()")
article_source = article.get_node("source/name/text()")
# ... extract other nodes here
offset = offset + pagesize # Compute the number of articles we've received
if offset >= num_found: # If we're at or past the total number, exit the loop
break
A.For the purposes of this example, we're going to skip the looping of the previous example (making multiple requests in order to get all the articles that the search found), and instead make just one request for articles and then get the images for each of those articles.
request_url = "http://api.newscred.com/articles?access_key=xxxx&pagesize=10&fulltext=true&sort=date&query=president+obama"
XMLTree = parse_xml(retrieve_url(request_url))
for article in XMLTree.get_node("/article_set/article"):
article_guid = article.get_node("./guid/text()") # this is the GUID of a single article
# Make a second request for the images that are associated with this individual article, by calling article/GUID/images, with your access key
image_request_url = "http://api.newscred.com/article/" + article_guid + "/images?access_key=xxxx"
ImageXMLTree = parse_xml(retrieve_url(image_request_url))
# Go through the image nodes in the response of that API call, and process them with that article.
all_images = []
for image in ImageXMLTree.get_node("/image_set/image"):
image_url = image.get_node("urls/large/text()")
image_data = retrieve_url(image_url) # retrieve the image itself
image_source = image.get_node("source/name/text()")
# Parse other nodes here
process_article_with_images(article, all_images)
A.There are two steps to looking up articles by topic. The first is to find the correct topic, and the second is to request articles that are tagged with that topic's GUID.
It's important to realize that a topic search can return several topics. For example, a search for "New York" will return a topic for New York City, New York Sate, New York Bay, New York Yankees, and anything else with "New York" in the name. In this example, we'll ask a user to select which topic they're interested in, but in reality you may want to do your topic lookups ahead of time and save the topic GUIDs!
topic_request_url = "http://api.newscred.com/topics?access_key=xxxx&query=barack+obama"
XMLTree = parse_xml(retrieve_url(topic_request_url))
topic_list = []
for topic in XMLTree.get_node("/topic_set/topic"):
topic_name = topic.get_node("name/text()")
topic_guid = topic.get_node("guid/text()")
# Parse other nodes here
topic_list.append([topic_name, topic_guid])
# Print the topic as part of a list, for the user to choose from
print "Topic %d: Name %s, Guid %s" % (len(topic_list)-1, topic_name, topic_guid)
# Ask the user what topic to use
print "What topic should we use? ",
topic_choice = sys.stdin.readline()
chosen_topic_guid = topic_list[topic_choice][1] # get the chosen GUID from the list
article_request_url = "http://api.newscred.com/articles?access_key=xxxx&topics=" + chosen_topic_guid
ArticleXMLTree = parse_xml(retrieve_url(articles_request_url))
for article in XMLTree.get_node("/article_set/article"):
process_article(article)
# This is where you would do your article processing.
A.Individual sources are specified by GUID, but source lists are specified by name. You'll see this same pattern with topics and topic lists, as well as articles and article lists. This example requests articles from the Los Angeles Times, and then it requests articles from a (fictional) source list named "Los Angeles Newspapers."
# First, lookup the source GUID for the LA Times
source_lookup_url = "http://api.newscred.com/sources?access_key=xxxx&query=los+angeles+times"
XMLTree = parse_xml(retrieve_url(source_lookup_url))
source_guid = XMLTree.get_node("/source_set/source[1]/guid/text()")
# Second, get articles from that one source
articles_lookup_url = "http://api.newscred.com/articles?access_key=xxxx&sources=" + source_guid
# Now, look up articles from any source in the source list named "Los Angeles Newspapers"
articles_lookup_url = "http://api.newscred.com/articles?access_key=xxxx&source_filter_mode=whitelist&source_filter_name=Los+Angeles+Newspapers"
XMLTree = parse_xml(retrieve_url(source_lookup_url))
article_list = XMLTree.get_nodes("/article_set/article")
A. NewsCred puts three timestamps on each of our articles, images, and videos:
The created_at timestamp records the time NewsCred receives the content from the original publisher. Content is generally available through our API within five minutes of us receiving it.
The modified_at timestamp records the time NewsCred receives an update for this same piece of content, or a copy of the created_at timestamp if we haven’t received any updates for it. Updates are most frequent on breaking news articles or financial news articles (where stock prices can be updated throughout the day), but any article is eligible to be modified.
The published_at timestamp records the date and time that the original publisher published the article. This is usually very close to the time we receive the article, but not always. Some weekly print publications will move articles online in the days before or days after the print version is released, and in this case the timestamps could differ by several days. In the case of prolonged outages, we may also receive articles that were published several days in the past.
A. If you’re looking for articles from a specific time - for example, if you want to see what the top stories were from New Years Day, you can use the from_date and to_date parameters to specify the boundaries for the publication date (ie, the published_at value). This would look something like from_date=2013-01-01&to_date=2013-01-02 (if you don’t specify the time of day, we assume midnight). This will return all the articles published on New Years day (midnight UTC on January 1, through midnight UTC on January 2).
If you’re looking for all the new articles we’ve received in our system since the last time you made a request, you should use the modified_since parameter. The easiest way to do this is to send either the last time you made the API call (in UTC), or to send a relative time based on the frequency you make API calls. for example if you were polling the API every 30 minutes, you could send modified_since=-30MIN.
A. This depends on how many sources you’re licensing from us, and how many articles these sources publish. There’s no one size fits all here, and frequently it takes a little experimentation to get it right. We do have some rough guidelines, however: