Project: Research Issues in Web-Based Database Applications

Student Researchers:
Phetsamone Bounsong, Tia Chung, Xiaoguang Liang
Advisor: Prof. Bettina Kemme
Institution: McGill University




Goals:

With the extraordinary growth of the World Wide Web, the Internet has become one of the best and most popular means of exchanging information. Many existing web applications today work with databases in the backend. Combined with databases, the web allows users to easily and inexpensively access stored data or electronic documents. Databases can be used as a repository for information or used to dynamically generate HTML. The uses of web-based database applications are unlimited. The goal of the project was to develop a better understanding of web-based database applications, study the different technologies available and their applications, and provide tools to help others develop such web applications. The main focus was on the usage of XML in such an environment.

Account of the Process:

The project began with the study of an existing web-based database applications called Proteomics. The Proteomics database and web application was initiated by two other students, but was not completed. This project was intended to help scientists share their information/findings across the web. Users are able to upload/download files to/from a common server depending on their permissions. Proteomics was built using Java with an ORACLE database. Looking at an existing prototype helped us to faster understand the principle architecture of web-based database systems and the development process of such an application. A considerable amount of time was spent studying the Proteomics project. Many problems were encountered when trying to start the application. In an attempt to clean up the system, several bugs were identified and corrected and the database was cleaned up. Possible alternatives were analysed and problems and improvements were researched and discussed.

From there, we focused on a subproblem in order to elaborate deeper into one research topic. The Proteomics application used XML at various places for data transfer. XML (Extensible Markup Language) offers a very flexible, semi-structured representation of data. Although XML is not new, many new applications of XML are being developed. XML, created specifically for web use, can be a very powerful language. In comparison with other markup languages, specifically HTML, XML describes the data itself, and not how it should be presented. As a result, XML can be used as a data interchange format that is independent of any application or operating system. This as motivation, the project continued with the study of XML and its relation to the web and databases. We decided to study the extraction of database information into files in XML format. Relational data access is performed via SELECT SQL statements. The result of such a query is a set of records. Different techniques were evaluated on how to transform SQL query results into user defined XML, including DTD, XML schema, XSL and a predefined mapping file. The last two approaches were implemented using Java.

The first implementation used XSL (Extensible Stylesheet Language) to transform an SQL query result into a specified XML format. XSL is a mechanism to specify the layour of an XML file. Our implementation first transforms an SQL query result into a standard XML format, grouped by record. Then the user defined XML format, specified by the given XSL file, is applied to the standard XML format. Since the transformation is done using XSL, many possible XML formats are possible.

However, using XSL, we cannot transform the SQL query results into any arbitrary XML file. In particular, XML allows for a hierarchical representation of data, where the information can be arbitrarily nested. We were looking for a second solution that can take full advantage of such nesting. This is useful, for instance, if we want to have a grouping of infromation (e.g., for each student of a set of students, we want to list all her/his grades). For that reason, we introduced our own XML mapping file that specifies the desired structure of the output XML file. With this, given a database query, and an XML mapping file, our implementation produces an output XML file that is grouped hierarchically according to the XML format specified by the mapping file.

Conclusions and Results Achieved:

---------------------------------
In conclusion, we gained a better understanding of the principle concepts behind web-based database applications. We learned of the different alternative technologies available, its advantages and disadvantages and the different problems that could arise within web-based database applications. We extended our knowledge of the relationship between XML and databases. In the practical part, we developed a set of Java libraries that convert SQL query results into XML files according to given data formats.

Most importantly of all, we gained the experience of working in a research project team to achieve a common goal. This project helped refine our ability to troubleshoot problems and tailor solutions to fit specific applications. The opportunity to work independently on several parts of the project also allowed us to discover our personal strengths and weaknesses. The knowledge, and experience ained from this unique opportunity will certainly help us in our future endeavor.


The project webpage can be found at:
http://www.cs.mcgill.ca/~kemme/crew /index.html