When we are browsing the web, we are probably looking for something or some information, either for fun, school, or work. We might google first and get to the right page if we are lucky, or we have to google several times and go to different pages to get the whole picture. Wouldn’t it be nice if the browser knows what we might need and helps us put together as much stuff as possible, and saves us some time of clicking?

Nowadays, browsers are getting smarter with more built-in features, or you can install a lot of third-party plugins or extensions. The most common example may be the automatic dictionary or glossary lookup. Like MashLogic, whenever you mouse over to a term, a popup window will try to tell you what it’s about by gathering information from many other sites. BlueOrganizer can add a smart link automatically next to the link to a book or music you might be interested in, and tell you its popularity, where to get it, or even the reviews from your friends. Deckkr lets you stay at the same page and still be able to navigate other sites. There are still many other nice and cool tools which make browsers more fun and smart, but what exactly should be a killer application for smart browsing? This is up for debate and should be determined according to different purposes.

What can we expect more from a browser? Most search engines can provide us a page with multiple results. A better vertical search engine can give us more and better organized results from different sources. Still, the browser has no idea about the differences across pages and domains. It’s only a window to the WWW. To make the browser smart, data awareness is the key component. It has to be able to digest the content and recognize special objects and patterns. Taking apartment search as an example, when a user is browsing a page with apartment listings (say, a landlord site), the browser can show the user other “similar” apartments in the proximity of those on the current page. Or, with the specific knowledge about apartment domain, such as rent, bedroom, location, etc, it can store the results we are interested in a better way, not only via traditional URL bookmarking, but rather like a shopping cart that knows about the structure of the data objects being remembered, even with comparison and analysis. With such “data awareness”, the browser can become a vertical search engine for apartments (or any data of interest).

Smart browsing can be applied to all kinds of domains, not just the apartment search. Using the same framework and backend system, with a little modification, we should be able to deploy it to other areas, like regular shopping activities. Shopping for an apartment can be easily replaced as shopping for a book, a car or some services. The main difference is still about the domain-specific knowledge. Apartment domain is our current focus, and there are still rooms for improvement. Enabling smart browsing is one of the areas we are trying to achieve. We want to make the apartment search as easy as just one click away, and eventually, make search a totally different experience!

We see the possibilities of immersing search with browsing, although it might still be unclear how smart browsing will materialize itself. Everyone might have different ideas on how to make browsers smart, and what the most wanted features should be. As we are still shaping our products towards this direction– People, what do you think? What’s your desire?

Yuping Tseng.

Posted in General | No Comments »

Search and Cazoodle

November 26th, 2008

On the Web, users search for various things. While general Search Engines like Google, Yahoo and MSN are excellent for answering many keyword queries they do not provide answer to all user queries. Marrissa Mayers illustrates the limitations of search engines in her blog quite beautifully by describing what she was upto in one of her weekends. She says:

…This past Saturday, I kept track of the things that came up in conversation that I wanted to search for right then but couldn’t:

Are “fab,” “goy” and “eely” words? (There was a Scrabble game going on.) What time does J.C. Penney open on Saturday? Which school has a team called the Banana Slugs? What is the team mascot for San Jose State? How much power does that hydroelectric dam generate? What do you call a group of turkeys?….

Users are not always looking for web pages as results. Often times, they are looking for answers to a question as Marissa Mayers mentions above. Websites like hakia.com and ask.com are developing semantic search that can provide relevant results to queries like “which is the largest continent?” For some user queries like “weather new york“, Google also provides direct answers as a result. The trend is clearly to understand user queries better to provide semantically relevant results.

But what if the user query is even more complicated? Consider the query, “2 bedroom apartments in New York”. User expects the search engine to return a list of apartments matching their specifications. Instead, Google returns a list of landlords or sites that can only get users started. To complete the search process, users need to visit each site and search each site for apartments. If site is on the deep Web, users must interact with its query form, go to the result page, and note down the results. Needless to say, the overall process to find information is tedious and time consuming.

The reason why traditional search engines do not perform well for such complicated queries is because they view the Web as a collection of “pages”. Whereas, the Web is more like a collection of real world objects like apartments, jobs, restaurants. When a user is searching for “2 bedroom apartments in new york”, they are not looking for pages containing these keywords but rather actual apartments listings in New York. To answer such queries involve finding sources containing apartments listings, understanding various attributes like bedrooms, integrating data from multiple sources and even accessing deep Web sources. This is where vertical search engines come into picture. In this video,

Michael Yang, CEO of Become.com, Suranga Chandratillake, CEO and CTO of Blinkx.com and Gautam Godhwani, CEO of SimplyHired.com talk about vertical search trends. Vertical search engines are the next wave of search engines dominating the search market that integrates domain specific data to provide users with relevant results. Two notable mentions of vertical search systems are Simplyhired.com and Indeed.com in the jobs domain. These two sites collect millions of job postings from the web and provide search on them. Surprisingly, we do not find similar vertical search systems in other domains like rental apartments.

One might ask: “Are not there already such vertical sites that provide specialized data like apartments.com for rental apartments, monster.com for jobs or even craigslist.org?”. Well, they do provide specialized data but the key difference is that they do not collect data by “crawl and search” like what Google does for web pages. Instead they are feed-based that collect data through user postings or feeds. In most domains including rental apartments, there many big players who fall into the “feed-based” category but there are very few search systems.

There is lack of vertical search systems in many domains and this is where we at Cazoodle can contribute. One reason why we do not see many vertical search systems currently is that it is technically challenging to extract and integrate data like apartment listings or job listings from hundreds of thousands of sites in a scalable way. At Cazoodle, we have developed a platform to quickly instantiate specialized vertical search systems. The platform abstracts out common challenges in developing verticals across domains, making the rest of the customization very rapid. Starting with our first offering, Cazoodle Apartment Search that we are now expanding aggressively to new locations, we are also expanding into newer vertical domains.

We hope to provide Google-like search systems for various domains starting with rental apartments to help the users find the information on the Web much more easily.

Govind Kabra

Posted in General | No Comments »

The new trend of web 2.0 websites is that they are getting more dynamic in nature. The dynamic effects make the website more usable, give quick response to the user, connect them in real time with data and other users. Improvements upon JavaScript performance and attempts to create easier to use libraries make this task a lot easier. The JavaScript libraries are still at its fledgling state. There is no standardization for these libraries now, not enough technically trained people to be widely used.

We do not see this as a setback working with JavaScript libraries. In the apartments domain, except the recent map based mashups, most sites are largely static 1.0 pages. It is not unusual for a user to submit 3 pages of forms before coming to the results, so that means there is still a lot of room for improvements. Recently popularized web 2.0 applications are simple in nature, self contained, do not attempt to add unnecessary content (e.g. selling unrelated service to users, adding too much ads). You’ve probably stumbled upon them, mint.com, rememberthemilk.com, to name a few. They are self contained application operating on the web, rather than traditional sense of a website. Besides simplicity, I think their major advantage is that they don’t demand too much from users or distract them. Submitting a form is demanding from the user, waiting for page load is demanding from the user, and asking the user to refresh the page is demanding from the user. Giving immediate feedback is best user experience possible.

Building a self contained JavaScript application is not an easy task, but there are many options out there to help you. Starting from simple jquery, prototype to more comprehensive frameworks like extjs, yui, dojo. The simpler the library, the easier it is for you to customize. To brand up a website for your company, I suggest you go for simpler libraries so you can build an exact look and feel for your website. The more complete frameworks have there own predefined layouts, looks and feels of panels, so it can confuse your brand name with other websites built under the same framework.

On the web programming side, there is more demand for trained programmers rather than traditional web wizards who come from a design background. Traditionally, programmers have a conception that JavaScript is a tool web designers use to patch up their website, so many also don’t learn seriously about it . However, nowadays, JavaScript libraries are built upon classes, inheritance etc.. And the whole application is structured by modules and classes. It is no longer a web wizard’s job, but rather a job of traditional GUI designers. Whichever background you are coming from, you need to brush up JavaScript 101 for the web 2.0 application and it requires all the things you’ve learned about programming. The technology is volatile, as long as we keep evolving, we will always be cutting edge.

Paul Yuan

Posted in Technical | 2 Comments »

As a vertical search engine, Cazoodle has been doing data processing with Hadoop MapReduce. Like many others, we use MapReduce for crunching large scale data, for analysis tasks such as data annotation and page classification. However, with the large-scale data, some problematic records may fail and then crash the entire MapReduce task. As we don’t necessarily know where these culprits are, we use try/catch block to detect it in general.

Code:
try {
... process data
} catch {
... handle errors, or just skip it.
}

But what if we use native codes thats may crash the whole JVM? For example, when we call a native C library from JNI, there may be null pointer errors in some rare cases. Or, some records may lead to OutOfMemoryException which will then make JVM unpredictable, like the following:

Code:
try {
... process data
// Oops, the JVM crashes sometimes
} catch {
// Sorry, no exception because the JVM is crashed.
}

It looks like caused by a bug in the code; however, sometimes it’s an unavoidable behavior. For example, what If we are using a third-party library and we cannot change it?

So our goal is to skip problematic records when the task is re-run in another Mapper/Reducer. However, the task may be assigned to another machine in the cluster, so using a local log to skip the record is impossible. Here the popular Memcached comes in handy. The code becomes:

Code:
if(memcacheClient.add(String.valueOf(key), new Boolean(true)) == false) {
... log the record and skip it
}
try {
... process data
// Oops, the JVM crashes sometimes
} catch {
// Sorry, no exception because the JVM is crashed.
}
memcacheClient.delete(key);

Whenever a record crashes the JVM, it will not be deleted in the cache. When we re-run the Mapper, it will be logged for further debugging instead of processing it (and crashing the JVM again). The only requirement for this approach is we should have a unique key to store in the cache.

Thanks for the high performance of Memcached, there is almost no overhead. In our experience, one memcache server can handle 300 tasktrackers in a job that parses web pages. The internal bandwidth usage is less than 100 kb/s. Now we can spend more time on developing better service rather than fighting with problematic records. :D

Remark: In Hadoop 0.19.0, there is a new feature to skip records which can not even be read (http://issues.apache.org/jira/browse/HADOOP-153).

York Tsai

Developer Team

The real estate market is clearly feeling the pinch of the bad economy. According to compete.com, web traffic of real-restate companies like Trulia, Zillow has decreased in the month of September by 11.8% and 2.2% respectively. Recently, Zillow decided to lay off 25% of its workforce. Rich Barton, CEO of Zillow.com said in his recent posting titled “Difficult times, Difficult decisions” on October 17, 2008:

This week we are reducing our workforce by 25%. This was an incredibly painful decision for me and the leadership team, but, in the end, we concluded that we had no choice but to securely batten down the hatches as we sail into a major economic storm.

Redfin laid off 10% of their workforce earlier in the month. The current time is clearly proving bad for real-estate companies.

Twelve months ago, many of the CEOs were still optimistic about growth in real-estate. I remember back in November 2007 at the ILM (Interactive Local Media) 2007 conference hosted by the Kelsey group, the CEO of real estate companies like Zillow, HomeThinking, were very optimistic in their outlook on real estate traffic and online revenue. Obviously, they were oblivious of the ongoing trend. They cannot be blamed for their wrong sight. Even the experts at Wall Street are taken by surprise by the current rapid turmoil. It was extremely difficult to predict that we would see the worst economic slump since the Great Depression.

Rental market is faring better than real estate. According to the latest Harvard report on “State of the Nation’s Housing in 2008“,

Rental housing is reasserting its importance in US housing markets. With so much turmoil on the forsale side, many households have reconsidered their financial choices and opted to rent rather than buy.

In the current times, it is getting increasingly difficult to get loan to buy a home. People who are not able to secure credit to buy homes are turning to rental housing. In a way, the collapse of real estate bubble is converting into greater demand for rental housing and people are looking for affordable rental housing.

We hope to do our share to help people find their new place easily. Cazoodle Apartment Search provide comprehensive listings for major metropolitan areas like New York City, San Francisco Bay Area, Chicago, Los Angeles. And we are expanding quickly to other locations like Miami, Philadelphia, Detroit. Our plan is to cover the top 25 metropolitan areas by early next year. We are seeing increase in traffic month over month and hope to continue the growth as we expand to new locations and improve the overall system too. Cazoodle is a group of immensely smart, talented and passionate people that are constantly working to achieve their vision.

Arpit Jain

Developer Team

Posted in General | 2 Comments »

How we started Apartment Search

October 26th, 2008

The idea of apartment search came a long time when I was in undergraduate taking Kevin Chang’s(Founder of Cazoodle and CS professor at U of I) database course. My team chose to build apartment search for the class project and we innovated using the just launched Google Maps API. At the time, we were manually entering apartments into the database and we asked Kevin how we could gather more data. The traditional process, he said, was to hire programmers to write parsers for sites, and that effort could way be simplified in the future.

It was quite clear in my undergraduate project that collecting apartments is a technical challenge, and it can really bring value by integrating scattered apartment websites together in our school community and beyond. I noted his idea but never knew what advanced research would be required to get us there.

Right before my graduation in spring 2007, Cazoodle launched with a powerful data extraction tool that can build wrappers easily. Apartment search launched with bountiful data. Although created by a different team of 20 people and unrelated to my undergrad project, I was happy to see the opportunity the company has.

Launched in Champaign, apartment search has gained wide coverage of apartments in this area. Popular sites, like apartment.com had 5 apartments for Champaign, Google Base had 19. We have 100s of apartments collected, covering popular and lesser known landlords. Our approach is bottom up expanding location by location, crawling landlords and listing sites. The state of the art apartment sites are using the latter approach, and with our technology advantages, we use data crawling to cover comprehensively and exhaustively. Why build another apartment search? In short, we learned the Google way of making apartment search more comprehensive. We aim to introduce a one stop portal for apartment search not only to benefit the user, but also drive traffic to other apartment sites to complement their service.

It will require a leap of faith to believe how we can scale up to compete with the big players. In the past year, our technology has matured and we can scale to a new location in a short time. The dream to accomplish the entire US map is no longer out of reach. Targeting every location to be as comprehensive as Champaign, I hope every apartment you search online will be in our database in the near future.

Paul Yuan

Developer Team

Posted in General | 1 Comment »

Boston Launched

October 3rd, 2008

As promised we have added Boston as our next location. It covers 30 miles around Boston from HaverHill on the North, Markborough on the West, Bridgewater on the South and obviously Atlantic Ocean on the East :). We have collected ~8000 apartments from more than 70 Boston sources from local landlords to big national sites, we have them all. Boston got a little delayed than our target launch date of Sept 28 because our data crawling team was moving to a brand new office. Moving always take time and especially when you have to setup everything from electricity to computers yourself. Well, now everything is set and our data crawling team is happier than before and have promised to work harder :)

The next location will be Dallas. We recently finished collecting the apartments sites for Dallas and to our surprise found that there are more than 100 sources even more than Boston! We were expecting Dallas to be smaller than Boston, after all Boston is more popular right? But not to worry, we will work more hours and finish collecting Dallas apartments by the end of October. After that, as per the popular demand, we will work on Miami.

You can suggest your location at http://apartments.cazoodle.com and beat the highest number of votes.

Cazoodle Apartment Search has expanded and now also cover Chicago (~29000), New York City (~30000), Los Angeles (~15000), Seattle (~3500) in addition to SF Bay Area (~14500) and Urbana-Champaign (~625).  We plan to add one new location every 3-4 weeks. Boston and Dallas will be the next locations that we will add.

Your location not covered? You can vote your location at http://apartments.cazoodle.com and we will add it to our list.

If you have any feedback on the site, please submit it here.

Thanks,
Arpit Jain
Developer Team

We found that site takes long time to load especially narrow your search panel. We did load balancing, enabled caching, optimized code and now the site load much faster.

If you still find problem loading the website then let us know.

Arpit Jain
Developer Team

New Features Added

May 17th, 2008

We have added many new features to the site:

1. Apartments from different sources merged: If an apartment is present on multiple wesites, they are merged and shown as single apartment. In the infowindow, you can still see all the other sources.

2. Narrow your search panel resize itself: Depending upon the number of options in the narrow your search panel, it will resize it self. Removes a lot of pain.

We are continously upating the website and adding new features. If you have any suggestions please let us know.

Arpit Jain
Developer Team