captain holly java blog

Learning Python Painlessly

Posted in Python, TDD by mcgyver5 on October 26, 2011

I’ve been teaching a ten-year-old how to program and I can say that Python is a great first language.

  • avoids overhead of java by being weakly typed and not compiled
  • has both object oriented and functional features
  • widely supported
  • readable.
  • huge range of libraries and tools

It has been pretty quick for me to learn as well, as I have to study some to stay ahead of the ten-year-old.  I’ve found that a fast way to learn is to use the included unittest framework in Python and use the TDD model illustrated succinctly in the Bowling Game Kata and the Prime Factors Kata.  (What is a Kata?  Read This)

Ted Dzuiba memorably said:

You know what’s more awesome than spending my Saturday afternoon learning Haskell by hacking away at a few Project Euler problems? Fuck, ANYTHING.

and he has a point, but I watched my dad study medical journals when we were on vacation and I know that it takes time and hard work to keep up on technology.  Anyhow I’m amazed at how the combination of TDD and python helps me whiz through those Project Euler problems.

When one problem is done, the forums offer great insights.  For example, for problem number 4, I came up with:

def isPal(n):
  st = str(n)
  if st == st[::-1]:
    return True
  else:
    return False

def prods():
 prds = []
 largest = 1001
 for kk in range(999,100,-1):
    for kz in range(kk,100,-1):
       product = kk * kz
       if isPal(product):
          if product > largest:
             largest = product

   return largest

but a poster posted this one-liner:

print(max([x*y for x in range(900,1000) for y in range (900,x) if str(x*y) == str(x*y)[::-1]]))

I did not know you could do that.  the idiom “x*y for x in range….” and putting an if statement associated with a for statement are powerful tools.   This is referred to as List Comprehension.  The bit at the end is called “stride notation” (as I learned In this post about how to reverse a string).  It returns every nth member of a collection.  A negative n will start from the end of the collection and you can use that to reverse a string.

Negatives of Python:

  • The Python world seems to have impaled itself on a fence between 2.x and 3.x .  All the documentation out there is based on 2.x but they are trying to move to 3.2.  So, the forums are filled with answers like “in 3.2, you need print (x) instead of print x”
  •  When I encounter  __main__ and if __name__ == ‘__main__':  my brain shuts down.
  • The included library timeit is the goofiest thing I’ve seen this year.  It wants you to put the code to time in a String and call exec on it.
Tagged with:

Javascript silo proposals

Posted in Uncategorized by mcgyver5 on March 26, 2010

This post was inspired by a recent Security Now podcast (transcript) that featured John Graham-Cumming and reviews ways to secure Javascript by doing things like separating namespaces and working with subsets of Javascript.

Javascript code commonly lives all together in the same namespace. That is, if you visit a page with included Javascript code from several sources, the code from one can, by default, call code and change code from the others.

Not only can other scripts access fields and functions of Javascript objects, alter their values, but they can replace functions themselves with functions of their own.
This is true most of the time and it is a huge security problem.
A common perception from the web is that this doesn’t matter, that security “is a server-side concern”, that fears are overblown, that any restrictions on Javascript will screw up the web 2.0 “revolution”. Whoa.
Consider what a third party script on your page can do:

  1. pull in additional scripts from anywhere
  2. make requests to to the server
  3. Request information from the user
  4. Get around same origin policy and share information with anywhere

Think of the ad-based attacks against the New York Times a year ago and then the attacks against Yahoo, Fox News and Google from the other day. These are examples of the well known sites. Think of all the second and third tier sites that will place active content on their pages for a fee. Attackers can just purchase ads!

Closures
Javascript does have a way for a programmer to silo their script. This is called a closure. Closures require some “leveling up” in Javascript skill and this entire article is a must-read. I could not really grasp “closures” until I read an article about closures that did not use the word “closure”. A closure is a function specially defined inside another function. These “inner functions” can be treated like private java methods.
Simple Javascript Closure Example:
normal Javascript without closures (all functions are “public”)

function getUsername(){
 return username;
}

as part of an object:

var f = function(){
  var username = "tim"
function getUsername(){
   return username;
}
}

any code on the page can call f.getUsername(); and get the username

but when using closures

f = function(){
   
    // this function is made private
   function getCookie_private(){
        alert(document.cookie);
   }
   return {

         // a public variable:
         var screenName = "Sue-ellen";
        // a public function
         function publicStuff{
              alert("hello everyone!");
         }
}();

that syntax looks weird because of the parenthesis at the end. These are important because it makes this entire function run immediately, constructing itself and then
returning the block after the “return” keyword as THE function. Except this function now has some private parts accessible only by itself!

using this idea, you can create namespaces with these anonymous functions:

com.wordpress.captainholly = function(){

///  all kinds of public and private functions and fields
....
}();


This is explained here.

Closures make it so an “inner function always has access to the vars and parameters of its outer function, even after the outer function has returned.” This makes it possible for com.wordpress.captainholly private methods to keep their knowledge of the object they are part of so that
com.wordpress.captainholly.currentUser.screenName is not a fake idea.

This article carries this idea further to create “Durable” objects — objects that, even though they have publicly accessible methods cannot have them switched out by another script.

Web security cannot rely on every Javascript programmer both understanding and consistently using closures. We need something more. Some products have come out that attempt to silo and otherwise secure Javascript on the fly.

A Google project called Caja is a safe subset of JavaScript. It drastically rewrites code and supposedly isolates functions properly so that other scripts can’t call them. They have a nice testbed that shows the rewritten code and the browser behavior that results.

ADsafe is a technology promoted by Douglas Crockford himself. AdSafe encapsulates included code, forcing it to interact with ADSafe as a proxy object between it and the rest of the page. External code is only allowed access to the ADsafe Object. The ADsafe object then addresses the rest of the page, the DOM etc. in a safe manner.
A sampling of what Adsafe brings, enhanced by Douglas Crockford’s Powerpoint presentation:

  • No access to the “document” object
  • take away most of the access to DOM objects through subscripts. With adsafe, you have to use ADSAFE.get() and ADSAFE.set() instead of someControl['importantControl']
  • No using the “this” keyword
  • some of the other keywords not allowed: apply, arguments, call, callee, caller, eval, prototype, valueOf
  • Dom interface is query based and scope of queries is limited to the content of the div of the third party widget
  • Guest code has no access to any DOM node.
  • Only the “guest” code has to operate through Adsafe. The home site’s code can still do anything.

Another proposal for buttoning down Javascript is Mozilla’s Content Security Policy, CSP, which proposes to restrict the way JavaScript is used in the browser.
The CSP specs have a long list of features that give granular control over such things as :

  1. A general “allow” directive that lets the website owner define which remote sites can provide resources.
  2. An ancestor-pages directive that allows the website owner to restrict other sites from framing the site. Used properly, this cancels out a big vector for cross site scripting attacks.
  3. More granular directives that allow the site owner to define which domains may provide object, css, image, and Javascript elements to the page in question.
  4. Where to report unauthorized access attempts

Settings are controlled through a header. Using a response.setHeader() in java, here are a few examples:
To allow only resources originating from same domain:

response.setHeader("x-content-security-policy", "allow 'self'");

To allow no other site to act as a frame for the page being loaded:

response.setHeader("x-content-security-policy", "frame-ancestors self");

The CSP can be toggled off and on through the about:config interface. It won’t affect pages that don’t have a CSP header and pages that have CSP headers can still run in browsers that don’t implement CSP. Of course, since clients can’t be sure that the web site will enforce CSP and the web site can’t be sure that the browser will implement it, this just amounts to an interesting proto-standard that might help us understand where the web will have to move some day.

Users can protect themselves
While waiting for browsers and servers to implement these protections, individual users can protect themselves by using the NoScript Firefox extension. First of all, Noscript prohibits Javascript from sites you haven’t approved from running at all. On top of that, Noscript includes a module of ABE (Application Boundary Enforcer) that does something very similar to Mozilla’s CSP. When you first visit a website and decide to allow Noscript to allow scripting on this website, ABE takes over and enforces firewall rules about what outside resources, if any, may interact with the current site.

This post is just about ways to secure Javascript, which is just a part of overall browser security. For a complete treatment of browser security, see the excellent and constantly evolving Google browser security handbook.

dbvisualizer – the very handy SQL client

Posted in Uncategorized by mcgyver5 on March 24, 2010

There are a lot of things to love about dbvisualizer. The tool is stable and fast. It has easy installation, a great SQL editor, built-in support (and drivers) for many databases, quick object and data editing, and most of all simplicity. Somehow these guys made a super flexible but simple and intuitive tool. I compiled a list of all the things I liked and disliked about the program. This list includes some nitpicking. Their people are really good about accepting bug reports and feature requests and I’ve nitpicked to them.

  • One of the most tedious things about any SQL client is drilling down to get to the object you need. DBVisualizer helps by allowing you to drag your most used database objects to a favorites bar for easy access. Favorites items have the same context menus as the regular item in the tree. Opening an item from the favorites bar opens the object tree to the correct place so that it is then easy to access other items near it. This feature gets the most enthusiasm from my coworkers. It is somewhat hidden at first as you need to show/hide the favorites toolbar from the view menu. Unfortunately, there does not seem to be keystroke access to the favorites bar.
  • The grid interface for editing data is well done. It has commonly accepted keystrokes for searching, editing cells, deleting rows, and saving changes.
  • The first upgrade process felt disconcerting because it wasn’t clear if running the exe was going to upgrade an existing install, erase my settings, or what. It created a brand new DBVisualizer install and left the old one and still imported my settings. Not expected behavior. The next day I mistakenly clicked the shortcut to the old version and was asked to upgrade again. “Pretty aggressive release schedule”, I thought.
  • There is complete support for exporting user settings (connections, bookmarks, preferences). This helps when migrating to a new machine or welcoming new employees.
  • Initial install was easy and it automatically detected all my drivers. It also includes the MySQL driver, again because this is a commercial product. This is an improvement over SQuirreL, which for better or worse, cannot package the MySQL driver due to licensing issues.
  • This review would not be complete without comparing it to Navicat, another popular cross-database multi-platform SQL Client known mostly as a MySQL client. Navicat has two powerful features not present in DBVisualizer. This is server performance monitoring and job scheduling. It would be nice to schedule backups inside DBVisualizer, but for the most part, the databases I connect to are managed by other people (DBAs) who perform backups. As a developer, this is not a key feature for me. Navicat is slightly more expensive than DBVisualizer. (~$200 for DB Visualizer and ~ 375 for Navicat) . This was the one feature that is compelling about Navicat. It would be nice to monitor server loads. I’m not even sure I’d have the rights to access that info, though.
  • Explain Plan is beautiful. Unfortunately, I’m using it mostly with PostgreSQL and DBVisualizer does not have explain plan for PostgreSQL. Navicat does have explain plan for its supported databases including PostgreSQL. PGAdmin III and SQuirreL have limited implementations. None match the clarity of the DBVisualizer explain plans. Except not…. for PostgreSQL.
  • DBVisualizer is missing some PostgreSQL specific syntax such as vacuum and analyze are not well supported. Compare this to SQuirreL, which has Vacuum and Analyze in the context menu for each table.
  • Another example of the flexibility of DBVisualizer is the ability to create Folders in the database tab. I organized all the disparate databases my applications use into Folders to make the list more manageable. I found that favorites will not follow as you change your folder structure. So, make your folders first and then your shortcuts in the favorites bar.
  • DBVisualizer isn’t afraid to package proprietary packages such as the yWorks (for the references graphs) and Install4j.
  • DBVisualizer has a monitor feature. This monitor is the kind that monitors your data over time, as in: “this table is growing at 1100 records per week”. This tool is more complex than I can manage for this post.
  • The SQL Editor is really well done. The autocomplete saves tons of time. jumping back and forth between a “SQL Builder” and SQL Editor is doable
  • You can use variables in the SQL Editor. This makes the query ask you for fill-in values when you run it. Makes repeated querying very easy.
  • In addition to data editing, it is really easy to edit table structure. An “Alter Table” Dialogue provides controls to change all table aspects and generates an SQL statement for you. A problem I encountered with PostgreSQL is that In Alter Table –> Constraints –> Drop constraint,
    The generated SQL produces a syntax error about the name of the constraint . PostgreSQL requires a slightly different syntax for dropping a non null constraint which this tool does not account for as I write this. I reported this to them through the forums and they said they would fix it.
  • By the way, the forums and documentation are really good. I’ve posted 8 or so posts to the forums as I was learning this tool and writing this post. They were all answered within an hour or two… by employees. Not only can you get answers, but find out a whole lot more about this product.
  • Also with PostgreSQL, it can be hard to edit keys or delete duplicate records postgreSQL has no rowid (oid) unless you specify one when building the table. One suggestion I have is a way to automatically add OID. Compare this to pgAdminIII – well, no comparison because pgAdminIII has no live editing of tables. So… compare this with SQuirreL, which has a clunky editing interface but still has the ability to delete an identical row. It noticed the duplicates, alerted me to the fact and then went ahead and deleted ONE of the duplicates. I wonder how it knew which to delete? This scenario is exactly the same for MySQL in both tools. The real solution is, of course, to always have a unique key.
  • I also inadvertently set my row limit to 4 for the data grid. No idea how it happened, but it stayed that way and as I navigated around, it affected all my tables. I don’t like the way that worked since it never LOUDLY told me that I wasn’t seeing everything. I now have my row display limit set higher, but if a table exceeds that, I sometimes find myself wondering where the hell my data went. There is no control like “show me the next page of results”.
  • Importing and exporting data via CSV was a bit clunky. I was thrown off by the presence of the previous action’s logs in my import dialog. As another user reported in the forums, it can be hard to know if you ran the import already. The import tool also allowed me to try and import an SQL file as CSV. That did not go well.
  • Using it with embedded hsqldb. It was easy to load the hsql.jar file as the driver. It was less easy to know how to point to the “database file” because there is no actual file. Instead, you are supposed to point it at theName.properties and then remove the “.properties”
  • There are plenty of configuration options in the Tool Settings dialog. There are even more in a file called dbvis-custom.prefs, where you can disable features and force JDBC to do unexpected things. Making uneducated changes to this file and others in the same directory could really screw up your install. And since it is a java application, there is a whole galaxy of startup options.

data security through small cell suppression

Posted in Uncategorized by mcgyver5 on December 22, 2009

It seems like the worlds of statistics and Java don’t talk to one another enough.

Small cell Suppression is a statistical term for not allowing users to be able to infer what should be private information from public sets of data. For example, consider a survey on athletes with staph infections that was queryable by age, county, sport and race. If there were statistically small number of hispanic wrestlers in Otter Tail County, you could probably guess who had a staph infection. So, if a population is identified as statistically vulnerable to this inference, then that data is suppressed.
The Washington State Dept. of Health page has a pretty good explanation:

Why are small numbers a concern in public health assessment?

Public health policy decisions are fuelled by information. Often, this information is in the form of statistical data. Questions concerning health outcomes and related health behaviors and environmental factors often are studied within small subgroups of a population. Continuing improvements in the performance and availability of computing resources, including geographic information systems, and the need to better understand the relationships between environment, behavior, and consequent health effects have led to increased demand for data on small populations. These demands are often at odds with the need to preserve privacy and data confidentiality. Small numbers also raise statistical issues concerning the accuracy, and thus usefulness, of the data.

In general, problems with confidentiality arise when there are small denominators (population size represented in a specific cell in a table); and, problems with data reliability arise when there are small numerators (cases in a specific cell in a table).

Definitions
The broader term for these controls is “Statistical Disclosure Control”. The challenge is to use optimal levels since too little control leaks public data and too much control makes published survey data useless.
“Imputation” is the practice of substituting values for missing data items. If we are leaving out data to protect confidentiality, then substitute data must be imputed so as to not skew the overall results.
“Inference”: The practice of finding secret data in published survey results. By measuring inference, we can find out if disclosure control is an issue.
Spearman’s Rank Correlation: a statistical tool for inference. It can find out how closely two variables are tied. This web page will perform this correlation for you (if you are ready to hand type your data into a web form).

I could only find one tool related to this in the java world. I’m surprised it isn’t more of a booming field since it touches on survey data, health and financial data, and security and privacy. Is that too small of a niche? I doubt it. Inattention to the dangers of leaking information in this way could potentially cause a lot of harm and cost a lot of money.

The stats package SAS has small cell suppression features. This document (Word Doc) discusses how to deal with the holes in the data that result from suppression.

So, how to have this feature in my java app?
R = the open source statistical package
CRAN = a list of packages for use with the R language
sdcTable: statistical disclosure control for tabular data
lPSolve: an R package that sdcTable depends on
rJava = an R package that allows R to create java objects and, through the JRI package that is now part of rJava, allows java run R in a single thread and make calls to it.
JGR = java GUI tool that makes use of rJava for a java GUI interface to R. R binaries must be installed and the JGR jar then allows java to call it. The source of JGR has good, production quality examples of how to call R from java.
Using all that, one should be able to create an ad-hoc query front end for survey data, run submitted queries through small cell suppression rules in R, and
return safe data.
There, I solved your small cell suppression problems. I’ll leave the details to the reader. What could be easier than integrating a stack of open source C and Java projects into your web app? or, rather, tune in for part II: implementing this stack O’ fun.

IntelliJ Idea: Notes on switching

Posted in Uncategorized by mcgyver5 on December 19, 2009

I recently switched over from working primarily with Eclipse/ MyEclipse and these are some large and small obstacles and how to overcome them.

  1. I want to ignore persistence framework errors. Go to Project Structure –> JPA facet Delete Data Sources Mappings (but not JPA Configuration Descriptor!)
  2. Web application doesn’t reflect changes to html, xhtml, jsp, etc. Go to Project Structure –> Java EE build settings. Make sure Exploded Directory Project compile output path is the same one the server is using (ie where your project lives on disk) Also make sure compile output path is the same as where your project lives and not some crazy intelliJ invented directory..
  3. I want editor to be linked with menu, like in Eclipse. This is autoscroll from source, a button in the top row of the project pane.
  4. I used Ctrl-shift R (for resource) all the time in Eclipse. In IntelliJ IDEA, the same function is CTRL-Shift-N (for name)
  5. Auto complete does not work! In my case, this was due to the La Clojure plugin (0.2.172) When I disabled this plugin and restarted, autocomplete (and several other features) came back. A web search on this turned up nothing. Maybe now it will.
  6. How to integrate CVS
    • If CVS is not connected, go to Version Control –> –> CVS –> Configure CVS Roots –> Test Connection. This appeared to reset the connection for me.
    • To Setup CVS repo Version Control –> CVS –> configure CVS Roots –> click “plus” button to make a new root. Enter your cvs info
    • Import existing project into your IntellJ IDEA File –> open project –> browse to find .pom file
  7. How to get vim keyboard mappings in intelliJ. go to settings –> plugins –> available –> right click on IDEAVIM to install. The step I skipped screwed me up big time: You must copy the keymap file according to these directions.
  8. I hacked the authentication mechanism on an app so I wouldn’t have to log in every time during testing, and I was afraid I might accidentally commit it to CVS. So I had to ensure this file never got mixed in with the rest of our code. This is a CVS question rather than an IntelliJ IDEA question, but the answer is to create a new branch. Right-click on file –>CVS–>create branch (name it “DEAD_BRANCH” or something) and check the “Switch to this branch” box. The next time you go to commit that file or the directory it is in, that file will show up as [switched to tag DEAD_BRANCH] and if committed, will only be committed to that branch, so that your co-workers, when they update, will not get your screwed up file.
  9. Keystroke goodness. The following keystrokes are indispensable. For a complete keystroke chart, go to help –> keystroke reference
    • move lines or blocks of code. This comes in handy on almost a daily basis and for some reason isn’t in the keystroke chart. Ctrl-Shift up arrow moves a line or selected block up. Ctrl-shift down arrow moves a line down. If it does not work, try hitting escape.
    • IntelliJ has a history of clipboard (buffer) contents. To paste from it, use Ctrl-Shift-V
    • Rename: Shift -F6
    • Generate Getters and Setters: Alt-Insert
    • Find usages: alt-F7
    • Duplicate Line or selection: Ctrl-D.

Alternate languages on the JVM?

Posted in Uncategorized by mcgyver5 on October 7, 2009

I’m trying to summarize several discussions about alternate languages on the JVM that I absorbed at the No Fluff Just Stuff conference. Can I become a language evangalist based on a weekend at a conference? I suppose not, but there were a lot of compelling arguments for why we should be looking at some of these new functional languages on the JVM. It was put forward that most of the reasons we like Java have to do with the JVM and not with the Java language:

  1. Cross platform
  2. stability
  3. Performance
  4. security
  5. huge world of libraries

These will hold true with any language that compiles to the JVM.

Why are they even considering new languages? Multiple reasons bubble up from conference as a whole.

Extensibility.
Discussed the example of hadoop. It is an open source framework that handles huge amounts of data in a distributed way. It is inspired by Google’s MapReduce papers. They evidently found some of the core java classes insufficient for their needs. If you look at the docs for org.apache.hadoop.io.text, it says, “It provides methods to serialize, deserialize, and compare texts at byte level…. In addition, it provides methods for string traversal without converting the byte array to a string.” Does this point to an extensibility problem in Java? If not, why couldn’t they reuse any code from String? Someone at the conference asked why can’t I make Object define toXmlString() so that every one of my classes that descends from Object automatically has a toXMLString() ? This is extensibility and Java doesn’t do it as completely as some other languages might.

A language shouldn’t limit what you can do. Certain language constructs not available in java (closures, switch statements, folding) enable developers to be far more efficient.

OO might be failing us. We try to think of Objects as changing in place. Rich Hickey, the creator of Clojure, rejects this: ” The future is a function of the past, it doesn’t change it. ” If we stop thinking of data as persisting and changing over time and instead recognize that a thing is immutable and when it changes it becomes a different immutable thing. Like a date, or an account balance. The state of an account a point in time is immutable. Adding money to it does not change it, it creates a new state. This 55 minute video of Rich Hickey explaining some of these ideas was recommended at the conference and is amazing. As he explains, all of our concurrency problems come from the notion of objects changing in place.

No Fluff Just Stuff – twin cities

Posted in Uncategorized by mcgyver5 on October 5, 2009

I learned a bunch of neat stuff over the weekend at NFJS. It was a wonderful combination of filling in the gaps for tools I use all the time and trying to show us what is coming in the future. The future, everyone agreed, was in alternate, functional languages on the JVM. I’ll talk about why in a separate post. The non-tech talks were all about agile development. At the end my brain was all stretched out and floppy. Today I want to go in a million directions at once.

Tagged with:

struts form boolean checkbox

Posted in Uncategorized by mcgyver5 on September 22, 2009

We all understand that when a checkbox is not checked on a form, it is not present in the request object. This is the basis for many headaches in web application programming, especially when using multiple form pages. When using multiple form pages, as in a wizard, the struts way around is to have a reset() method that contains some logic for setting the value to false if it doesn’t exist in the request. Again, this applies to situations with a session scoped form.

The documentation for the html:checkbox tag says:

WARNING: In order to correctly recognize unchecked checkboxes, the ActionForm bean associated with this form must include a statement setting the corresponding boolean property to false in the reset() method.
In practice, the only properties that need to be reset are those which represent checkboxes on a session-scoped form. Otherwise, properties can be given initial values where the field is declared.

public void reset(){
    this.citizen = false;
}

There are several confusing posts out there in forums about how to populate checkboxes when viewing forms with existing data. One says to have a hidden form field with the same name as the checkbox. Another has us jumping out of struts and using regular JSP tags with logic. Both of these are unnecessary and have potentially bad repercussions later.
The real solution is to use a html:checkbox with a name equal to that of a bean and the property equal to the name of the boolean variable in that bean that the checkbox captures. The following will check or uncheck the checkbox depending on the value of “citizen” in the applicantBean:

<html:checkbox name="applicantBean" property="citizen" value="true">

to work this, your code must invent an empty applicant bean before loading the blank form, or struts will whine that there is no such thing as “applicantBean” in any scope.

how to use Apache Bench (ab) to test a page that requires login

Posted in tomcat, Uncategorized by mcgyver5 on September 10, 2009

ab is a tight and effective tool for load testing web applications. It comes with every install of apache httpd.
If a page is behind a login screen, you can use the -p flag to define a file that contains post variables for login and password:


C:\Apache2.2\bin>ab -p C:\posts\post.txt -T application/x-www-form-urlencoded -n
1000 -c 22 http://myServer/myapplication:8008/CentralCashier/userLogin.do

If a page is only accessible by a logged in user, not directly accessible from the login page, then you can use the -C flag to define a cookie. You have to get the value of the session identifier cookie from a valid session. Use a proxy like Webscarab or Paros to capture a request and copy the JSESSIONID=xxxxx from the request and use it with ab:


C:\Apache2.2\bin>ab -C JSESSIONID=36D5AE14223E1D4ED0B2BBC5C7F411EA -n 1000 -c 22 http://myServer/myapplication:8008/CentralCashier/userSearch.do?method=search

Alternatively, you can just turn off the authentication filter for the purposes of your test.

Evaluating WebScarab

Posted in security, spring, Uncategorized by mcgyver5 on July 29, 2009

I was asked to do a security assessment on a co-worker’s Cold Fusion application. It is protected on every page by a NOT findnocase(cgi.http_host,cgi.http_referer) check to ensure the request came from the same domain. This is a good way to prevent forced browsing and most url injection attacks because if you mess with the URL, this tag knows it and stops all the shenanigans.
This is where a proxy comes in. I’ve worked a bunch with Paros and some with Burp, but my employer does not allow me to download these without some extra paperwork. Webscarab, for some reason, is allowed. Webscarab is written entirely in Java, has a zippy UI and has widening adoption.

Webscarab allowed me to do forced browsing on the application and learn that the application relied solely on that domain check to make sure the user was authenticated (That is, they could only get to the site through the login form). Webscarab also allowed me to find many XSS bugs.

Webscarab is infinitely scriptable (with beanshell).

Webscarab has a tool that evaluates session identifiers for their strength. I would guess that most web frameworks these days have very strong session identifiers. In fact, I challenge anyone to find an example of a weak session identifier on any web app that shouldn’t be replaced anyway for one hundred other reasons.

Startup Options
Webscarab starts in Lite mode, which is just the web proxy, by default. To get the full meal, you have to start with java -DWebscarab.lite=false -jar webscarab.jar
Default memory is 64MB and this can get used up quickly. Online examples show webscarab having ~510 MB available. This is achieved by adding -Xms32m -Xmx510m to the java startup args. Just like with some other java desktop apps (Like IntelliJ Idea) you can click on the Green|Yellow|Red bar along the bottom of the window to force garbage collection and free up some memory.

Things That Could Be Improved:

  1. Inconsistency: Some features are available through a right click, some through a double click, some from a menu item and others from buttons or tabs somewhere on the screen. Some fields look editable but aren’t. Some are editable on one click, others on two. Some edit fields select the whole field when clicked, but typing appends to the end of the existing entry.
  2. Other screens have a delete button. Not the Proxy Listener Tab. To delete a listener you must stop it. If I a listener fails to start, it may not be stopped and so cannot be deleted. I have to stop any other service using the same port as my listener, THEN start my listener, and THEN stop my listener to delete it
  3. The interface for getting rid of conversations is difficult to use. Webscarab can fill up pretty fast with banal conversations and the only easy way to get rid of them all is a restart. There is a Tools –> remove conversations menu item, but no regex that I enter seems to get rid of conversations.
  4. There should be some way to construct the proxy filters based on existing requests. By this I mean when a request is trapped that you never want to see again, you can flag it in some way to add it to the ignore list.
  5. Judging from several posts to the mailing list, Webscarab only works with Sun’s brand of java.

To address user experience as well as other issues, Webscarab is undergoing a total rewrite. This is currently known as Webscarab NG. They will be using the Spring Rich Client Platform. The new product also has database integration. This is a work in progress and needs lots of testing. So, if you are looking for an open source project to help, this would be an excellent choice. According to the email list, the Webscarab NG project leader has been directing his work at the OWASP Proxy lately. Even though Webscarab NG is in development, development also continues on the current Webscarab.

Follow

Get every new post delivered to your Inbox.