Monday, August 17, 2009

GNU sed (Stream EDitor)

sed -r 's/\t+/,/g'


sed invoke the stream editor
-r use extended regular expressions (similar to using the -E argument for grep). This gives meaning to the '+' character in my regex.
s tells sed that we are doing a replacement ("substitution") operation
\t+ find occurrences of one or more tab characters
, replace it with a comma
g do this substitution for all occurrences of \t+

So, today I had a problem.  A friend needed me to convert a 10 MB data file from tab-separated format to comma-separated format.

"This should take about 2 seconds."

I wasn't on my trusty little laptop (running Ubuntu 9.04 Jaunty Jackalope since March) and was stuck using a lab computer on campus, which was, of course, running Windows XP with no useful utilities whatsoever.  To try to save some time, I tried to do this conversion right on my friend's computer.  We opened the document in MS Word, and tried to do a Find and Replace for tabs, converting them to commas.

Slow.  Killed the program several minutes into the operation.

Next, over to my trusty laptop.  Loaded up jEdit, a handy programming editor that has done well for me in the past.  Tried to do the find and replace.

Also slow.  Killed this about 10 minutes into the operation.  "It really shouldn't be taking this long."  What went wrong?  JEdit was out of memory.  I found that out from the command-line terminal where I launched jEdit.  Hmmm... Maybe some kind of error box would have been nice so I didn't just sit there for 10 minutes wondering. ;)

No more of this garbage.  We're going to the command line.

Always go to the command line.

I already knew about sed, but my memory was a little rusty on the command-line arguments.  After about 10 minutes, I finally found what I was looking for.

Converted the file in about 2 seconds.

Why is it that something that should take 2 seconds always takes 30 minutes?

Monday, April 13, 2009

Shell script for Google search result parsing

This is the shell script I wrote to help me perform the analysis I did for Quest 5.

1. Perform a site:yoursite.edu search in Google, displaying 100 results per page.
2. Save each page (Google will only give you 10 at most) into a folder named yoursite.edu
3. Download the shell script to the directory that contains the yoursite.edu directory.
4. At the command prompt, type:
./google-results-parse yoursite.edu

5. OR, if you named the yoursite.edu directory something different, run this:
./google-results-parse yoursite.edu savedresultsdirectory

6. It will create a "savedresultsdirectory-parsed" directory, which will contain a "domainlist" file and a "pagelinks" directory. The "domainlist" gives the subdomain breakdown of the search results.  The "pagelinks" folder contains files for each subdomain that include all of the search result URLs for that subdomain.

Download the file here.

Open Ed. Quest 5 -- Searching for a Better Way (to Search)

Quest 5


"Many BYU faculty already openly share their syllabi and other course materials on personal websites, through iTunesU, and through other mechanisms ... Find as many of the open educational resources being shared by BYU faculty as you can..."

It seems to me that discoverability is really going to be the ultimate make-or-break hinge issue for OER.  One could produce world class, high quality OER that trumps everything that any institutional OER effort produces, and yet remain in complete obscurity with no hope of ever actually sharing these wonderful OER with anyone at all.  And after all, if you take the time and trouble to make some kind of resource with openness in mind, it seems silly to have it be completely worthless (or at least, gravely underused) in the end because you weren't able to put it somewhere that people would find it.

This post isn't going to discuss the hows and whys of publishing open educational content for maximum discoverability. We'll save that for another time.  However, Quest 5 gives us the specific assignment to comb over BYU's web presence looking for faculty-produced OER content, and it begs the question, "How would one go about finding all of the OER on a university's web space?"

The task is not trivial.

Thursday, April 2, 2009

Copyright in Distance Education

(It is at this time that I would like to make a plug for Creative Commons licenses.  Thank you.)

I think I've talked more about copyright this semester than at any other time in my entire life.  This is not surprising, however, as I would guess that I am like most people in many respects, and I am assuming that most people aren't well versed in the subtle nuances and intricacies of US copyright law, including the Digital Millenium Copyright Act (DMCA) and the Technology, Education, and Copyright Harmonization Act (TEACH).

What a mouthful.

Wednesday, March 18, 2009

Accessibility in Online Distance Education Courses

A while back I ranted about Google's CAPTCHA human detection implementation, and how it is practically impossible sometimes to sign up for a Google Account.  But the moment of truth arrived when I listened to the accessibility recording of the CAPTCHA.  In case you forgot, I can summarize it with one word -- psychotropic.  (Usual disclaimers against drug use apply.  Seriously, kids, don't do drugs.)

So, this experience made me start thinking.  I've done my fair share of web development.  So, it's not like I was completely unfamiliar with accessibility issues.  I know that images need to have alternate text, I know that it's good practice to put a "Skip to the Content" link at the top of the page to skip over navigational links, etc., etc.  However, I didn't really begin to understand what it was all about until finally having an experience on the web were I was prevented from doing something I wanted to do because my senses were unable to decode the information being presented to me.

Saturday, March 14, 2009

Quest 2 - For Real Now

Up to this point in the course we've done a lot of talking.  We've had great discussions about the history of the open education movement, usage rights, sustainability models, reusability, remixability--even hippies!--and just about everything in between.  But now it's time to get to work!  No more talking!  This is going to be fun.

So, looking ahead to Quest 6, we in our guilds will need to collaborate to create a course entirely out of open educational resources.  Because we have so little time left, we decided as a class that we would devote everything we do in the remaining quests to work toward our goals for Quest 6.

The course that we as a class originally (more on that later) decided to build is 10th grade social studies--World Civilizations.  We will attempt to build this course entirely from OERs that meet the Utah State K-12 Core Curriclum Standards for World Civilizations.

So I decided that for Quest 2 I would just jump into the pool and do my best to find as many OERs as possible that could help us meet the objectives and standards set out in the Utah K-12 Core.  This actually turned out to be the first time I've ever made a real attempt to collect a large number of OERs from multiple repositories for a single purpose.  Everything that follows here is a description of my first purposeful experience looking for OERs.

If at any point you feel like cutting to the chase, click on this link or just scroll to the end.

Thursday, February 26, 2009

Accessibility Issues are No Laughing Matter

Except this one really made me laugh quite loudly.  I was trying to create a new Google account so there would be a webmaster email address for the BYU PSST research group's website.  As I went about happily filling out information for this new account, I suddenly hit a wall when Google wanted me to type in some letters that looked all swirly and mashed together like a trick one's eyes might be playing when one has been smoking peyote.  (I would, at this point, like to disclose that I have never actually smoked peyote and don't know if the preceding comparison is a good one or not.  I would also like to discourage anyone from smoking peyote to find out.  Moving on.)