CS11 Erlang - Lab 4 - RSS Queues

To continue our work on the RSS Feed Aggregator project, this week we will implement another central component of the aggregator: an "RSS queue" process for holding and managing RSS feed items. When our server begins to track a new RSS feed, we can start up a new queue process to hold the RSS records from that feed. The queue will receive messages that contain RSS feed items, and update their internal state based on whether the item is new, an updated version of an older item, and so forth.

From this point forward, you should also use Erlang documentation markup that can be processed by edoc. At the end of this lab, it should be possible to run edoc on your code and generate good documentation.

Here is a specification of what you should build:

  1. Create a module named rss_queue for the RSS-feed queue implementation.

  2. The module should provide a server function that actually implements the server loop of the queue process.

    For now, the queue's state will only consist of a list of RSS feed items that are currently in the queue. The RSS feed items need to be kept in increasing order of publication-time, so be very careful about managing the queue order.

    We will extend the state of this queue process in subsequent labs, as we need to keep track of additional information.

  3. The server function needs to handle the following messages:

    {add_item, RSSItem}

    This is the most complex operation that the queue must support, mainly because there is a lot of book-keeping work to do.

    Another process can send a tuple containing add_item and the actual RSS feed item to add, and the queue process will update its internal list to include the new item. Using the rss_parse:compare_feed_items/2 function you created in lab 3, you can update the queue in this manner:

    • If there is any item in the queue that produces same for the comparison, then ignore the new item. The incoming item is already in our queue, so we can just ignore it.
    • If there is any item in the queue that produces updated for the comparison, then the old version needs to be removed from the queue before the new item is added. Note that you shouldn't just put the new item where the old one was, because the new item may have a modified publication date.

      Once you have removed the old version of the item, you should update the queue as for the next case:

    • Finally, if no item in the queue is the same as this item, add the item into the queue, taking care to keep all items in increasing order of publication date.

    You should definitely break this entire procedure into multiple helper functions for the various steps; anything else would be impossible to maintain. (Plus, you will definitely have to extend this functionality in a future lab!)

    {get_all, ReqPid}

    Another process can send this message to the RSS queue, to request the entire contents of the queue. Of course, this also requires that the requester must also send its process ID (PID) to the queue.

    The implementation of this function will be very simple; the queue can simply send a list back to the requester; it doesn't need to be any more complicated than this.

    These are all the messages that the server needs to handle at this point. We will enhance the queue's capabilities in time.
  4. Create and export a helper function for starting a new priority-queue server process. This function should be called start, and for now it will not need any arguments. The return-value should be the PID of the new queue process.

    You can use the spawn BIF to start the new process. You should use the version that takes a module/function/arguments (MFA) triple, not the version that takes a fun. There are several reasons for this; one is that we can't pass arguments to a fun using spawn! We will talk about the other reasons in a future class. Also, you should use ?MODULE for the module-name argument.

  5. Create and export a helper function add_item(QPid, Item) to simplify adding an item to a queue process. QPid is the PID of the queue process, and Item is the item to add to the queue. The helper function just needs to perform these tasks:

    Use a when-guard and the is_pid/1 BIF to make sure that the value passed for QPid is actually a PID.

  6. Create another helper function add_feed(QPid, RSS2Feed), where the second argument is an #xmlElement corresponding to an RSS feed, rather than just a single element. This function should extract all items from the feed document, and then send each of these items to the queue in order, perhaps using the previous function.

    This should be an easy function to implement, given all of your other work, and given the very helful lists:foreach function.

    The function should return the atom ok when completed.

  7. Create a helper function get_all(QPid), which simplifies the process of retrieving the list of feed items from the process. Again, make sure that QPid is actually a PID. Note that this helper will perform a send operation and then a receive operation:

When you compile this module, keep an eye out for warnings about unused functions. You should have no warnings when you compile your code. You will probably need to export your server function, because Erlang has this annoying habit of eliminating functions that it thinks are unused. Any module function that is not exported, or directly/indirectly called by an exported function, is excluded from the resulting BEAM file. Since Erlang can't tell that spawn() is invoking your server function, it will be eliminated from the compiled result, unless you specifically export it.

Testing

Once you have completed all of this work, you need to test your new RSS feed queue very carefully. You can use the two documents from before, which are snapshots of the digg.com science feed, a few hours apart. Thus, the second snapshot contains many old items, as well as a few new items. You should also try retrieving your own RSS 2.0 feeds from various websites, such as CNN.com or BBC News. You can also try the Slashdot RSS feed, which is an RSS 1.0 feed, and should properly be flagged as NOT an RSS 2.0 feed by your helper functions.

Write up a file called testing.txt that contains the testing operations you use to exercise your queue. Include this file with your submission so that we can tell you have properly exercised your code.

When you are convinced that your RSS queue works properly, submit it on csman.


Copyright (C) 2009-2012, California Institute of Technology. All Rights Reserved.
Last updated November 5, 2012.