Developer Center

Frequently Asked Questions

Table of Contents

API Requests
1.1 How can I get all the articles? Do you have a push mechanism?
1.2 What happens if there are more new articles than can be returned in one response?
1.3 How do I know how many articles match my search?
1.4 Do you have documentation for all your API calls?
1.5 How can I make my requests as fast as possible?
Images
2.1 How can I get images with the articles?
2.2 Why am I getting images inside articles even though the has_images parameter is false?
2.3 Can I search specifically for articles with (or without) inline images?
2.4 Can I have the inline images as attached images, so that I can use the same code to retrieve all the images?
2.5 Can I have the non-inline, attached images placed inline in the article?
2.6 Can I get images that make sense with articles that don't come with their own images?
2.7 How can I limit the data that's returned in a response (aka, what is the fields parameter)?
API Responses
3.1 Why am I seeing Wikipedia entries and other descriptions in my search results? Shouldn't the response be news articles?
3.2 What is the tracking pixel?
3.3 What is the topic score?
3.4 What is the article score?
3.5 What is the metadata field used for?
3.6 What is the link node? Why doesn't it contain an actual link?
CMS Integration
4.1 How does NewsCred integrate with Drupal?
4.2 How does NewsCred integrate with WordPress?
4.3 How does NewsCred integrate with Joomla?
Other content related
5.1 How can I find other content related to an article?
Code Examples
6.1 About this Section
6.2 Retrieve all the new articles since the last time you made a query
6.3 Retrieve new articles and their images
6.4 Look up a topic and request articles
6.5 Request articles from a single source, and then a source list
Times, Time Ranges, and Timestamps
7.1 What are the different timestamps?
7.2 How can I search for articles from a specific time window?
7.3 How frequently should I poll the API to make sure I always have the most up to date articles?

API Requests

Q. How can I get all the articles? Do you have a push mechanism?

A.The NewsCred API is entirely RESTful, which means that it is solely pull-based. We only send you data when you initiate a request. That said, we have made it very simple for clients to make periodic requests for full updates.

The way to do this is to use two API parameters. The first is the modified_since parameter, which should contain the date and time of the last time you made a request. The second is the sort=modified_at parameter, which will cause articles to be ordered from oldest to newest, starting from the time you specify in the modified_since parameter.

Q. What happens if there are more new articles than can be returned in one response?

A. You can use the pagesize and offset parameters to control how many articles come back in one response. The API will support a pagesize up to 999 articles, but in practice we recommend setting the pagesize to 100 articles or less to keep the response time to a minimum. The default pagesize is 10.

You can make subsequent calls to get more articles, by increasing the offset parameter. For example if your pagesize was 20 and you wanted the first 20 articles that match your search, you would send offset=0 (this is the default behavior). If you wanted the next 20 articles, you would send offset=20. For the 20 after that you would send offset=40, and so on.

Q. How do I know how many articles match my search?

A. You can check the num_found parameter in the top node of the API response, <article_set>.

Q. Do you have documentation for all your API calls?

A. Absolutely! Our API is 100% open, and we keep the documentation on our website's Developers' section, available at http://developer.newscred.com/developer/docs

Q. How can I make my requests as fast as possible?

A. We work hard to make our responses as fast as we can, but it’s possible to make your API calls even faster by requesting only a subset of the data we return by default. The following adjustments can have a particularly big impact on response times:

  • Use the fields parameter to limit the data we return. For example, if you aren’t using our category classifications, then you can leave out the entire article.category group. If you aren’t doing anything with our topics, you can save time by not requesting anything within the article.topic group. The topic nodes and metadata nodes are particularly big, and if you aren’t using these you can skip over them for significant time savings. Even if you’re using some topic info but not all of it (eg, if you want to know the name of the topic but you aren’t using topic images), specifying some but not all topic fields will still improve response time. For more information on the fields parameter, please see this question.
  • Don’t go back more than 30 days. We keep content from the last 30 days in a smaller, faster database, since it’s the most heavily referenced. We don’t go into the older database unless we have to, but to explicitly prevent your query from going into the larger (and slower) database of older news, you can send from_date=-29DAY with your requests.
  • Limit your pagesize. A larger pagesize will lead to more data retrieval, which will take longer for us to return. If you don’t specify a pagesize we’ll return 10 articles by default.
  • Try enabling compression on your HTTP request. This will have the biggest effect on large data transfers (eg, lots of articles with all the metadata, topics, and other details) going over slow connections. In some cases, this can actually make response time worse. The time it takes to compress and decompress can be more than the time savings in network transmission, especially if you’re already minimizing the data through the strategies above. Your mileage may vary!

Images

Q.How can I get images with the articles?

A.There are three ways to get images along with articles. Each has their own pros and cons :

  • RSS Integration

    This is the simplest option. Articles with images are automatically included in RSS-formatted API responses. We follow the industry-standard MRSS format, so if your RSS parser supports MRSS, this is a great option.

  • Make an article/GUID/images call

    If you'd prefer to use XML or JSON, you can get the images through a second API call. When you retrieve a set of articles, one of the fields you'll have in the response is the article GUID. Once you have that, you can retrieve the images for each article by calling the article/GUID/images call, detailed here.

  • Use the fields parameter

    As an alternative to a second API call, you can use the fields parameter when you make your articles call, and make sure to request the article.image.* nodeset. Please note that if you use the fields parameter, you have to specify all the fields you want in the API response, not just the additional fields. That is to say, if you simply append fields=article.image.* to your API call, the only fields you'll get back will be the image data!

    You can find more information about the fields parameter in the fields section of the FAQ.

Q.Why am I getting images inside articles even though the has_images parameter is false?

A.There are two ways for publishers to send images with an article. The first way is for an image to be included with an article, without any placement information. Think of it as two separate items: a block of text which is the article, and a block of data which is the image. The two are associated, but not directly linked. These are the images that our has_images parameter is referring to. It's a flag to let you know that if you want them, you can get associated images through the article/GUID/images call (or the fields parameter, as explained in this question).

What you're seeing are "inline" images. These are images that are included as part of the text. These tend to be contextual, and their placement within the text is important (eg, in the case of an illustrated step-by-step how-to). Because these are part of the text (they're just HTML image tags) they should render automatically, and there's no particular need to flag them.

Q.Can I search specifically for articles with (or without) inline images?

A.Not yet, but we're working on it.

Q.Can I have the inline images as attached images, so that I can use the same code to retrieve all the images?

A.Not yet, but we're working on it.

Q.Can I have the non-inline, attached images placed inline in the article?

A.Not yet, but we're working on it.

Q.Can I get images that make sense with articles that don't come with their own images?

A.Yes! If you subscribe to a standalone images package (that is, if you're licensing content from an images provider, separate from any articles), you can use our "smart images" feature to request images that match articles, even if the article's original publisher didn't include any.

You can accomplish this by sending the 'smart_images=true' parameter with your articles API call, and the subsequent article/GUID/images API call (if you're using the fields parameter or RSS-formatted responses, you won't need to make the subsequent API call).

Q.How can I limit the data that's returned in a response (aka, what is the fields parameter)?

A.Sometimes our API responses will contain more data than you need. This usually isn't a major problem - you can just ignore the fields you aren't interested in - but sometimes in the interest of speed or brevity you may want to not even have the extraneous data returned at all.

Conversely, there are certain data fields we don't return by default (such as article images), in the interest of keeping our response time to a minimum. In order to fine-tune which fields are returned in the API response, you can send the 'fields' parameter in the API request.

The value of your fields parameter should include a list of all the nodes you want us to return on the object type you're requesting. We use period-separated node trees, so for example every node of an article object will begin with "article.", while every node of an image object will begin with "image.".

You can traverse as far down the tree as you'd like. For example if you wanted the name of the article source but nothing else about the source you could request the article.source.name node. Please note that you do not need to specify the 'set' nodes. The correct format for the names of topics is article.topic.name, not article.topic_set.topic.name.

It's important to realize that this will override the default fields that we return, not add or subtract to it. So if you do choose to use the fields parameter, you'll need to list all the fields you want us to return. This list can get long, and that's okay! A common fields query might look like this :

fields=article.title article.description article.guid article.published_at article.source.name article.topic.name article.topic.image_url article.topic.guid article.topic.topic_classification article.image.guid article.image.caption article.image.height article.image.width article.image.source.name article.image.urls.large article.author.name article.author.first_name article.author.last_name article.category.name article.link

A more minimal set of fields, which would make the API call run much faster (albeit with far less data), might look like this: fields=article.guid article.title article.description article.published_at article.source_name article.author.name

API Responses

Q.Why am I seeing Wikipedia entries and other descriptions in my search results? Shouldn't the response be news articles?

A.What you're seeing is probably the descriptions of topics. Topics are used to describe what individual entities are discussed within an article. Each topic itself has some metadata associated with it, including a description. While this appears in a description node, it is a distinct node from the article description. The XML path for the article description will be /article_set/article/description, while the XML path for the topic descriptions will be /article_set/article/topic_set/topic/description (JSON paths follow a similar hierarchy).

If you would prefer to skip over the topic information altogether, you can send the API parameter get_topics=false in your articles search, and the entire topic_set node will be removed from the results. Alternatively if you'd like to only receive some of the topic metadata (for example if you want the topic name and GUID, but don't care for the description or classifications) you can use the fields parameter to control which topic fields are included in the response.

You can find more information about the fields parameter in the fields section of the FAQ.

Q.What is the tracking pixel?

A.The tracking pixel is a 1x1 transparent image (it's basically invisible) that we require our clients to display with the articles wherever they're posted. It allows us to keep track of who posts what content, and powers the analytics screens in our control panel, for both clients and publishers.

You'll generally see the tracking pixel in two places in the API response, although you only need to display it once. It will be inside its own XML/JSON node named tracking_pixel, but it's also included as an HTML tag in the article content (the description node) at the end. The tag will always have its class set to "nc_pixel" as well, to make it easier to find should you need to do any special processing on it.

As long as you aren't parsing images out of the article text, you shouldn't need to do any special processing in order to support the tracking pixel.

Q.What is the topic score?

A.(Note: This section is specifically about the score attribute of the topic object, not to be confused with an article's score attribute, discussed in the next question.)

When topics are assigned to an article, they are ranked against each other and scored on their relevancy within the article. The most relevant topic for an article will always be assigned a score of 1.0, the second-most-relevant will have a fractionally lower score which is the second highest among that set of topics, etc.

It's important to understand that these scores are only meaningful within the context of the single article, however. If you have two articles that both have the topic Abraham Lincoln, one with the score 0.75 and one with the score 0.70, it shouldn't be taken as an indication as to which article is "more about" Abraham Lincoln. It's just a comparison against the other topics for those two respective articles.

Q.What is the article score?

A.(Note: This section is specifically about the score attribute of the article object, not to be confused with a topic's score attribute, which is discussed in the previous question.)

Article scores are an approximate calculation of how relevant an article is to the search parameters that were sent in the query. It's a combination of the frequency of the search terms, how new the article is, the size of the publication, and a number of other factors. The score for an article will change accordingly, every time you change your search parameters.

In practice, it tends not to be used very much. If you're more concerned about relevancy than recency, you're better off sending the sort=relevance parameter and taking the top results that come back from that.

Q.What is the metadata field used for?

A.Metadata varies from source to source, and is mostly used for internal troubleshooting within NewsCred. You're welcome to read through it, but there tends not to be much there. If you'd rather not receive this data at all, you can send the get_metadata=false parameter.

Q.What is the link node? Why doesn't it contain an actual link?

A.Each of our article nodes contains a link node. It's an unfortunate naming conflict, as this really isn't meant to be a web link at all. It's a unique string we get from our publishers, used to identify articles in their own system. Sometimes publishers use a URL as the identifier, but not always. Even when they do, the URL isn't always publicly accessible (eg, it may only be available from the publisher's internal Intranet). The primary purpose of this field is for us to troubleshoot issues with our publishers. We include it by default because we believe it's better to have more debugging info rather than less, but it's not a field which serves much purpose outside of that.

CMS Integration

Q.How does NewsCred integrate with Drupal?

A.We integrate with Drupal 6.x and 7.x by providing RSS feeds. The basic "out of the box" Drupal installation is quite bare-bones, however, and requires a number of modules in order to import data. Please see our separate Drupal integration guide for a full walkthrough.

Q.How does NewsCred integrate with WordPress?

A.The NewsCred Wordpress plugin seamlessly integrates NewsCred content with your Wordpress site. The plugin offers two main modes of operation:

  • Search for articles, images, and videos

    Once you’ve installed the plugin, you can search for content from NewsCred, from directly within your site’s Add New Post page. You can filter by source, topic, category, or text, just like you can through the API or our Content Explorer. Once you find an article (or image) you like, you can add it directly to your post.

  • Schedule regular updates

    The plugin can also store API calls to request on a regular basis, and add new posts automatically. These posts can be published on your site as soon as they’re imported, or saved into a drafts folder for you to review later.

You can use either or both modes on your Wordpress site. The plugin can also be configured to save metadata like topics and categories, auto-assign images, scale images, and more.

To download and install the plugin, please download it from the Wordpress Plugin Directory, or search for and install it directly from within your Wordpress admin panel.

Q.How does NewsCred integrate with Joomla?

A.The best way to integrate with Joomla is to write a custom module to import NewsCred articles (in either XML or JSON) and create articles directly through the module. Because Joomla installations tend to be so heavily customized, we don't offer a standard NewsCred module for this.

As a stopgap, you can use an off-the-shelf RSS module such as FeedGator to import NewsCred articles in RSS format. These modules offer much less flexibility and frequently have bugs surrounding images and formatting in general, but they do offer a faster turnaround time than custom coding.

Other content related

Q.How can I find other content related to an article?

A.When we return articles, by default we include a list of topics within each <article> node. Inside each of these topics is a GUID, which you can use to retrieve more content relative to that topic. For example, by calling the topic/GUID/articles API endpoint, you can get articles related to this topic. By calling topic/GUID/images, you can retrieve images related to this topic.

Code Examples

Q.About this Section

A.These code examples are meant to be just that - examples. They don't use a valid access key and may occasionally throw exceptions. If you do copy and paste these directly into your code, you are doing so at your own risk!

All these samples are written in Python, but the concepts should carry across to other languages without much trouble. We tried to avoid Python-specific structures wherever we were able to. We also take some pseudocode liberties with parsing the XML, since every programming language has their own library (or libraries) for doing this.

Q.Retrieve all the new articles since the last time you made a query

A.We're going to assume that the current time is 2012-01-01 12:00:00 UTC (all our timestamps and time fields are always UTC), and the last time we made a request was earlier in the day at 11:30:00 UTC. We're also going to assume that the number of new articles is more than our preferred pagesize (10 articles), so we'll have to make several page requests to get all the articles. It's also important that you update your num_found on every iteration through the loop -- if we ingest new articles that match your search while you're retrieving content, the num_found will increase from one call to the next. You'll want to take the new value, to make sure you get all the new content!

        previous_request_time = "2012-01-01%2011:30:00" # note that we URL-escape the space to a %20 character
pagesize=10 # how many articles we request back in each call
offset=0    # as we iterate through, this will increase by the number of articles we receive

request_url_base = "http://api.newscred.com/articles?access_key=xxxx&pagesize=10&fulltext=true&sort=modified&modified_since=" + previous_request_time

while True:
   request_url = request_url_base + "&offset=" + offset
   XMLTree = parse_xml(retrieve_url(request_url))
   num_found = XMLTree.get_node("/article_set/@num_found")  # this is the total number of articles which match the query
   for article in XMLTree.get_node("/article_set/article"):
       article_text = article.get_node("description/text()")
       article_title = article.get_node("title/text()")
       article_source = article.get_node("source/name/text()")
       # ... extract other nodes here

   offset = offset + pagesize  # Compute the number of articles we've received
   if offset >= num_found:     # If we're at or past the total number, exit the loop
       break
    

Q.Retrieve new articles and their images

A.For the purposes of this example, we're going to skip the looping of the previous example (making multiple requests in order to get all the articles that the search found), and instead make just one request for articles and then get the images for each of those articles.

    request_url = "http://api.newscred.com/articles?access_key=xxxx&pagesize=10&fulltext=true&sort=date&query=president+obama"
XMLTree = parse_xml(retrieve_url(request_url))
for article in XMLTree.get_node("/article_set/article"):
    article_guid = article.get_node("./guid/text()") # this is the GUID of a single article

    # Make a second request for the images that are associated with this individual article, by calling article/GUID/images, with your access key
    image_request_url = "http://api.newscred.com/article/" + article_guid + "/images?access_key=xxxx"
    ImageXMLTree = parse_xml(retrieve_url(image_request_url))

    # Go through the image nodes in the response of that API call, and process them with that article.
    all_images = []
    for image in ImageXMLTree.get_node("/image_set/image"):
        image_url = image.get_node("urls/large/text()")
        image_data = retrieve_url(image_url) # retrieve the image itself
        image_source = image.get_node("source/name/text()")
        # Parse other nodes here


    process_article_with_images(article, all_images)

Q.Look up a topic and request articles

A.There are two steps to looking up articles by topic. The first is to find the correct topic, and the second is to request articles that are tagged with that topic's GUID.

It's important to realize that a topic search can return several topics. For example, a search for "New York" will return a topic for New York City, New York Sate, New York Bay, New York Yankees, and anything else with "New York" in the name. In this example, we'll ask a user to select which topic they're interested in, but in reality you may want to do your topic lookups ahead of time and save the topic GUIDs!

topic_request_url = "http://api.newscred.com/topics?access_key=xxxx&query=barack+obama"
XMLTree = parse_xml(retrieve_url(topic_request_url))

topic_list = []

for topic in XMLTree.get_node("/topic_set/topic"):
    topic_name = topic.get_node("name/text()")
    topic_guid = topic.get_node("guid/text()")
    # Parse other nodes here

    topic_list.append([topic_name, topic_guid])
    # Print the topic as part of a list, for the user to choose from
    print "Topic %d: Name %s, Guid %s" % (len(topic_list)-1, topic_name, topic_guid)

# Ask the user what topic to use
print "What topic should we use? ",
topic_choice = sys.stdin.readline()

chosen_topic_guid = topic_list[topic_choice][1] # get the chosen GUID from the list
article_request_url = "http://api.newscred.com/articles?access_key=xxxx&topics=" + chosen_topic_guid
ArticleXMLTree = parse_xml(retrieve_url(articles_request_url))

for article in XMLTree.get_node("/article_set/article"):
    process_article(article)
    # This is where you would do your article processing.

Q.Request articles from a single source, and then a source list

A.Individual sources are specified by GUID, but source lists are specified by name. You'll see this same pattern with topics and topic lists, as well as articles and article lists. This example requests articles from the Los Angeles Times, and then it requests articles from a (fictional) source list named "Los Angeles Newspapers."

# First, lookup the source GUID for the LA Times
source_lookup_url = "http://api.newscred.com/sources?access_key=xxxx&query=los+angeles+times"
XMLTree = parse_xml(retrieve_url(source_lookup_url))
source_guid = XMLTree.get_node("/source_set/source[1]/guid/text()")

# Second, get articles from that one source
articles_lookup_url = "http://api.newscred.com/articles?access_key=xxxx&sources=" + source_guid

# Now, look up articles from any source in the source list named "Los Angeles Newspapers"
articles_lookup_url = "http://api.newscred.com/articles?access_key=xxxx&source_filter_mode=whitelist&source_filter_name=Los+Angeles+Newspapers"
XMLTree = parse_xml(retrieve_url(source_lookup_url))
article_list = XMLTree.get_nodes("/article_set/article")

Times, Time Ranges, and Timestamps

Q.What are the different timestamps?

A. NewsCred puts three timestamps on each of our articles, images, and videos:

The created_at timestamp records the time NewsCred receives the content from the original publisher. Content is generally available through our API within five minutes of us receiving it.

The modified_at timestamp records the time NewsCred receives an update for this same piece of content, or a copy of the created_at timestamp if we haven’t received any updates for it. Updates are most frequent on breaking news articles or financial news articles (where stock prices can be updated throughout the day), but any article is eligible to be modified.

The published_at timestamp records the date and time that the original publisher published the article. This is usually very close to the time we receive the article, but not always. Some weekly print publications will move articles online in the days before or days after the print version is released, and in this case the timestamps could differ by several days. In the case of prolonged outages, we may also receive articles that were published several days in the past.

Q.How can I search for articles from a specific time window?

A. If you’re looking for articles from a specific time - for example, if you want to see what the top stories were from New Years Day, you can use the from_date and to_date parameters to specify the boundaries for the publication date (ie, the published_at value). This would look something like from_date=2013-01-01&to_date=2013-01-02 (if you don’t specify the time of day, we assume midnight). This will return all the articles published on New Years day (midnight UTC on January 1, through midnight UTC on January 2).

If you’re looking for all the new articles we’ve received in our system since the last time you made a request, you should use the modified_since parameter. The easiest way to do this is to send either the last time you made the API call (in UTC), or to send a relative time based on the frequency you make API calls. for example if you were polling the API every 30 minutes, you could send modified_since=-30MIN.

Q.How frequently should I poll the API to make sure I always have the most up to date articles?

A. This depends on how many sources you’re licensing from us, and how many articles these sources publish. There’s no one size fits all here, and frequently it takes a little experimentation to get it right. We do have some rough guidelines, however:

  • We update the database of new articles roughly every three minutes. Polling for new articles more frequently than this will never yield new results.
  • With a few exceptions (principally the news wires - AP, AFP, Reuters, etc) most of our sources only send us updates every 15 or 20 minutes. Some sources only send us updates once a day. Depending on your particular mix of sources, it might not make sense to poll for updates more frequently than that.
  • While we do support pagesize values up to 999, in practice it’s usually better to request 100 or fewer articles at a time (larger pagesizes tend to make more sense for other API calls, such as topics or sources). The more data you request in a single call, the longer it takes us to retrieve all the data, and with pagesizes above 100 you start to risk timeouts. So while there’s a certain temptation to only poll a few times a day with a very large pagesize, it’s really a better practice to make more frequent calls for smaller batches.
  • If you do find yourself needing to pull several hundred articles at a time, we recommend making multiple API calls and using the offset parameter to retrieve all the data, rather than sending a single API call with a very high pagesize.