Blog Roundtable

From Stack Overflow
Jump to: navigation, search

Contents

Executive Overview

The roundtable system consists of a central server hosting topics, one or more admins, one or more bloggers, and one or more passive readers. Admins will take topic suggestions from an out-of-band channel (e.g. mailing lists, private email, internet chat, etc.) and post topics to the central server. Bloggers will read the topics, using them as prompts for writing essays and articles on their blog as responses to the topic. Included with each topic is a tiny snippet of HTML for the blogger to copy and paste into his or her blog entry. (The HTML consists of the code for a clickable image.) The blogger makes their blog post and includes the HTML at the bottom.

Passive readers will stumble across the central topic server or an individual blog. At the central server, they can see a list of topics, and for each topic can click through to a detail page, giving a list of all blog post responses. When the reader is at a blogger's site viewing a blog post response, they will read the essay and see the clickable image at the bottom. Potentially, the image has a logo or some other branding indicating what it's all about. It may also include dynamically generated information such as the number of people who have responded to the topic. At the very least, it should have some text indicating that clicking the image will bring the reader to the original topic with a list of all of the other blog responses.

Functional Overview

  • The definition of a roundtable in this context is a central website in which topics are posted. Bloggers use the topics as a prompt to write essays. The bloggers include a special image in their blog post response to the topic. To the user, the image is a logo or other piece of branding (perhaps dynamically generated to include the number of people who have responded to the topic.) Clicking on the image takes the user to the central server, where they are presented with a list of other bloggers' responses. To the roundtable software, the image is used as a trackback system to catalog responses.
  • The roundtable software itself is hosted at a central location with one or more admins. Bloggers compose the satellite locations. Direct links from the central server to all satellites exist. Each satellite has an image link back to the topic (which, in turn, has a list of other satellites.) This is slightly different from the original roundtable software, which employs a dropdown list and JavaScript to link to all other satellites. Not all blog software (specifically: hosted solutions like LiveJournal) lets users include JavaScript in their post. After a certain number of bloggers join, the dropdown ceases to be a scalable solution.
  • The roundtable server software runs as a standalone application, as opposed to a being a blog or blog plugin, giving the maximum amount of freedom as far as hosting choices. The server may include extensions that aid in maintaining a topic blog in parallel. For instance, the admins may post a topic to the roundtable server AND to a blog, including JavaScript (generated by the rountable server) in the blog post that gives a dynamic list of replies to the topic.
  • The satellite bloggers embed an image in their blog entry, the URL and referrer of which gets used to catalog trackbacks.
    • The image URL will include the topic ID as a parameter. The original thought was that it would also include the blog ID, with blog owners going to the website to generate their own "paste this in your blog" code. This method was decided against for two reasons. First, it is easier on the blogger if all bloggers use the same image URL, as it can be included on the same page as the topic (rather than requiring a blogger to have a login and click off to a special page.) Second, it is much easier for an entry-level hacker to forge a URL by tumbling parameters than it is for them to forge a referrer (although both can be forged), which could lead to incorrect URLs and database pollution.
  • Admins must be able to enter a list of valid blog URLs (with details about the blog, such as name and contact information) to prevent trackback spam. Only trackbacks in the list of approved blog URLs will get displayed on the central server.
  • Bloggers will not need a login, they just read the topic, find the image code to copy into their blog, write their blog entry, and paste in the image.
  • If a blogger wants to be a member of the roundtable, they need to contact an admin and get their URL added to the list of known-good URLs. They can still write essays prompted by the topic and include the embedded image, they just will not show up on the master list until their blog has been approved by an administrator. As soon as they are approved, any previous blog posts will appear in the master list (not necessarily immediately, but as soon as a page view of their blog loads the dynamic image. See also: "Details By Function/Dynamic Image" and "Gotchas" below for more information about trackbacks and permalinks)

Database

Roundtable database.png

Details By Function

Administrative Interface

All administration should be handled by the administrative interface. For the sake of simplicity, the initial version of the roundtable server will have that as a special set of pages in a standard HTTP password protected directory (e.g. /admin/).

Topic Editor

The admins should be able to add and edit topics using the administrative interface.

Blog Editor

The admins should be able to add, edit, and remove "approved" blogs using the administrative interface.

Response Editor (Remover)

In the even that a bogus blog entry response gets stored in the article table (either by a bug in the code or by malicious intent), admins should have the ability to remove the offending entries.

Topic List

All users should be able to see a list of current and past topics.

Topic Details

All users should be able to click on a topic and bring up the details, for example a list of all blog articles that are responses to the topic. This should also include a piece of HTTP code that a blogger can add to his or her blog to indicate that it is a response to the topic. The code would be something like: <a href="{topic detail URL}"><img src="{dynamic image URL}"></a>

Embedded Image

  • To the end user, the image should be something like "blah.com/trackback.gif?article=12"
  • An Apache mod_rewrite should be used to map that URL to a PHP script, for example:
RewriteEngine on
RewriteRule ^/trackback.gif?article=(.*) /trackback.php?article=$1
  • The PHP page should serve up the appropriate response header, for example:
header("image/gif");
  • If no article ID is given, then no database work should be done and the image should simply be served up and the script should exit.
  • The page should look at the remote referrer URL. If it is blank (e.g. the user has a privacy guard that block referrers), then no database work should be done and the image should simply be served up and the script should exit.
  • If the referrer string is present, try to match it to a blog ID. Also parse the URL to determine if it is a permalink (versus a homepage view, archive view, etc.)
  • If it doesn't match to a blog ID and/or is not a permalink URL, treat it as if it wasn't there (no more DB access, simply serve up the image and end the script)
  • See if there is already an entry in the article URL table for the given topic ID and blog ID. If not, add it.
  • Serve up the image and exit

The side-effect of this methodology is that an article will not show up on the central server as a response to a topic until someone has viewed the article from the permalink page. Presumably, the author will do this immediately after the article has been written. See the "Gotchas" section, below, for a more detailed analysis.

RSS

It is important to have at least two RSS feeds: one for strictly topics (so that bloggers can be informed of new topics without getting drowned in a sea of responses) and one for both topics and recent responses (so that they can read up on the new responses.) Some bloggers will favor the one and some will favor the other, but it is important to have both options.

JavaScript

It would nice to have dynamic JavaScript of all responses for a given topic. As mentioned above, this could be used by the administrators for keeping a blog in parallel with the roundtable server.

Gotchas

This section is dedicated to some of the tricky aspects of the project that need to be implemented well.

Referrer URLs and Permalinks

Using only the image's referrer and matching on domain name is not quite going to be enough to get a link to the article in question. This is because there are a variety of URLs in which to view the same blog post: the front page, archive by month, the specific article's page, a LiveJournal friend's list view, an RSS reader (standalone or web-based), etc. Basically, we're interested in only the permalink URL and only want to record the image's referrer URL if it's being viewed from a permalink page. In order for a response to show up in the central server, the image needs to be viewed from a permalink page and we need to key on to that permalink (ignoring any other page views from the same domain.) A few common blog software formats for permalinks include:

  • LiveJournal: username.livejournal.com/{number}.html
  • WordPress: example.com/{directory}/{year}/{month}/{title}
  • WordPress (hosted): {username}.wordpress.com/{year}/{month}/{title}
  • Blogger/Blogspot: {username}.blogspot.com/{year}/{month}/{title}

There are a couple of solutions to deal with this. One is to have a user-configurable regular expression that is stored as a field in the blog URL table. Any given blog will have a RegEx that tells whether or not the given URL is a permalink. This puts the burden on the administrator to enter an appropriate RegEx when creating an entry for a blog, which might be complex but is perhaps simplified by including several canned presets with an "Other" that lets the admin enter their own freeform RegEx.

Another solution is to hard-code some general rules. With the exception of LiveJournal, it looks like most hosted and user-installable blog software is similar and the following rules can be used:

  • Check that "/####/##/sometext" is present ("/[0-9]{4}/[0-9]{2}/.+" as a RegEx expression) and if not, it's not a permalink
  • Match the domain name against the domain names in the approved blog URL table
  • If both rules pass, then it's a permalink and should be added to the table of topic responses

The following rules apply to the LiveJournal exception:

  • Check that "livejournal.com" is in the URL
  • With a RegEx, check that the URL is "{something}.livejournal.com/{numbers}.html"
  • Match the domain name against the approved blog URL table
  • If all rules pass, then it's a permalink and should be added to the table of topic responses

A third solution is to have the blogger generate a special custom snipped of HTML code for them to put in their blog. For instance, a web form can take a topic and a permalink URL and generate some HTML similar to "<a href="{topic URL}"><img src="{dynamic image url}?article=12&permalink={URL-encoded permalink}"></a>". This puts the burden on the blogger, though, and suffers from a chicken-and-egg problem. The user doesn't always know the permalink until after the blog entry has been posted, at which point they will have to go back to the central rountable server, generate the HTML inclusion, edit their post, and insert the linkback image. Because this requires extra steps, some of which are manual, it is prone to error and frowned upon, as compared to the referrer-based solutions.

Without Using Referrers

A fourth option is to not use the referrer header of the image--to not even run any code when the image is requested. The image is not dynamic and it links to the topic. The blogger copies and pastes the image link code into their blog entry. Once the blogger has published the blog post, they take the permalink URL and return to the roundtable server. They log in (in this case, bloggers will have to have logins) and register the permalink URL as being a response to a given topic. This option is the most secure, but requires some extra steps and configuration screens:

  • When blogs are added to the system by administrators, a login for that blogger will also be generated. The blogger uses that login to post blog entries. (Alternatively, we could set up such a system in which the blogger can sign up on his or her own, without being entered/approved by an administrator--similar to a user signing up within forum software, possibly with admin approval)
  • Blogger login and "here is my response to this topic" pages will need to be written
  • Typical overhead stuff: "I forgot my password", verifying by sending an email, etc.

Hacking Referrer Headers

The weak point of the system is the referrer header. Browsers are on the honor system about sending along referrers. Some privacy applications will either strip the header or rewrite it to the root domain name--both of which are harmless operations (although they prevent us from capturing trackback permalink URLs, but we assume that at least one user without such privacy blockers will be viewing the permalink page, which is a valid assumption.)

On the other hand, a malicious hacker can craft an HTTP request that includes a bogus referrer. By using a list if approved blogs and blog URLs, we negate the chance of a spammer using this mechanism to obtain trackback links to a spammer webpage. On the other hand, a person with a grudge that just wants to eff with the system could craft HTTP image requests with referrer links to approved blogs that lead to 404 pages or incorrect blog entries. While this scenario is unlikely within our community, it is still possible. In the event that this occurs once or twice, the bad data can be removed from the article table. If this occurs more often, the offending IP address that generated the bad requests can be added to a blacklist. This blacklist can be added to the code itself (and maintained in the administrative section) or it can be a simple Apache .htaccess file containing a list of banned IP addresses.

Personal tools