CampSmalltalk: Interfacing Apache to Smalltalk

This topic is pretty closely related to the more general Smalltalk Web Application Server.

Organizer: ???

Members:

Gerardo Richarte
John McIntosh
Joseph Bacanskas (I would like to work on the Smalltalk Web Application Server project, also)
Nick Melnikov
Jan Barger

Guests:

Andreas Kuckartz

A bunch of us have been discussing hacking Apache so that it can talk to a Smalltalk server through TCP/IP. There are several existing packages that let you do this with Perl, Java and Tcl. We want to build one for Smalltalk. The primary advantage of this is that we don't need a webserver in Smalltalk. We can run along side Apache. This basically frees us up to tackle the backend stuff (e.g. wiki, dynamic page generateion, storing data in an OODB on the backend) and we get a really good, scalable, fast webserver to boot.

Another potentially interesting server is AOLServer (see http://www.aolserver.com for details and http://www.greenspun.com for reasons why it's interesting). It's become open source within the last week.

Also, David Farber posted the following on the Squeak List recently.

See WhiteCap at www.jdmsoft.com/WhiteCap.html for a mod_jserv <-> VW31 connection.

FastCGI Enables Smalltalk to Talk with Web Servers

FastCGI is an open source module that implements a high speed, scalable OS, web server and language independent module. Perfect for linking Smalltalk and web servers of all kinds.

What we need is a Smalltalk FastCGI client implementation that is portable across all versions of Smalltalk. Any takers?

- Peter Lount, 20001022

As soon as I can figure out how to configure Apache for FastCGI I plan to implement clients in VW & Squeak. -Roger Whitney 9-12-2000

From: David Farber

Subject: [ANN] Job Opening & Project Announcement

(Given the threads that erupted in the past 48 hours about server-side squeaking, this post seems kind of eerie...)

The internet development company i work for is planning a new web project and is going to let me use Squeak as the implementation platform. my plan is to build a server-side application environment based on Apache and Squeak. Since there is not yet an Apache module for Squeak, top priority on my list is to build one. also, an interface to some sort of data storage back end, whether it be a relational or object database, is needed. i have 2-3 months to do this in.

Management has agreed to hire at least one more person to help me with all this. we are located in Denver, Colorado (i live and commute from Boulder) and while physical proximity has its advantages, there would be no problem with someone working remotely. if you wanted to work remotely, this would be a 2-3 month contract job; if local, or you wanted to move, we would be more than happy to hire someone full time. i need someone who would be as comfortable coding the necessary C code to interface to Apache or a database as coding in Smalltalk. please do drop me an email if this at all sounds like your cup of tea.

As much of the project infrastructure (Apache module, application architecture, database interfaces) as possible will be released as Open Source.

david

- David Farber dfarber@numenor.com

So, do you really need to hack Apache, or can you reuse one of the existing modules and just write Smalltalk to match it? Even if the existing modules can be improved, it will be hard to know how to improve them unless you have experience with Apache. It is probably best to just use it for awhile, and to only change it based on experience.

Re-using is definitely better. As I understand it there are two main possible ways to talk to Apache.

Connect by sockets to a running image/image. This is like the Apache JServ stuff, and one would think we could re-use the C part of that directly and just implement the protocol. Peter Lount looked at it and there was a lot of complication,e.g. strange shared memory manipulations. One downside of this is that I get the impression they're talking about radically changing the protocol for a future version. It's really not clear from the web site, but there are hints.

Update: It looks like JServ is more or less dead. Apparently most of the effort went into the Jakarta project, which is tied to Tomcat, which is Sun's Java Web Server evolved into an apache thingy. See Jakarta. No it's not! see java.apache.org, there really is a lot of activity going on with jserv, and it would be a perfect place to plug into the apache framework. The company I work for has done exactly that with Common Lisp, and we are quite pleased with the approach. you can download our open-source software from alpha.onshore.com/lisp-software. Anyway, jserv is a great product and is certainly not dead. Check it out again! -- Lyn Headley

Actually link Smalltalk directly into the Apache executable. This is more like the Apache modPerl approach. This would be more difficult to do with an arbitrary Smalltalk, though it would certainly be possible with Squeak, and might be possible with other versions. I don't think we could re-use much. -- Alan Knight

Isn't it possible to communicate directly with the module (assuming it is a dll)?
In such a scheme VMs (or any other external process) and Apache server(s) register with the module, the module can encapsulate the difficult stuff like shared memory and process scheduling.
Such a module wouldn't be Smalltalk specific so more people might be interested to help with writing the module.
Remains the task of connecting the different Smalltalk dialects to the dll and creating a portable Smalltalk api.
--Reinout Heeck

You can find out info on the ModSmalltalk at mod.smalltalk.org. It's too bad that the JServ is dead, but we could possibly use it's source code as a starting place or as an idea of what might be required for a professional system.

Linking in a Smalltalk directly to Apache is a good idea but it may require the source code for the Smalltalk. If we modified a copy of Squeak this might work great. Actually it might be possible to create a squeak plug in and use that as a doorway into the Apache.

-- Peter Lount

i am actually not familiar at all with the sockets/JServ stuff. my intention was to do a mod_perl style plugin. the kicker here (and this may be what Peter saw in the JServ stuff) is that Unix Apache forks children to handle the actual HTTP requests. so, if you want to store information to be available across HTTP requests (i.e. "session" information) you have to have the children processes share memory. i've got Writing Apache Modules with Perl and C which does a good job of discussing everything it takes to write Apache modules--except when it comes to shared memory between the children HTTP processes. David Farber

I bet that it would be better to use a shared DBMS than shared memory. A shared DBMS is more scalable. If all sharing is through a DBMS then the total throughput of your system is limited by the throughput of the DBMS, not by the speed of your web servers. You can then easily add more web servers, and they will use the DBMS for concurrency control. Ralph Johnson

What he said. In general, for high-volume stuff, statelessness is the way to go. The big benefit is that you don't have to keep a bunch of state around in memory for things that may never use it again. As soon as a request finishes, write any permanent information out to some form of persistent store and read it back in again if necessary. This also allows user requests to be independent of particular servers, since it doesn't matter where a request comes back to. Commercial systems typically use this kind of approach, and if session information can be shared it is shared by storing it out to disk and being able to read it back in rather than by sharing memory. -- Alan Knight.

hmmm...i'm going to have to mull that over for a little bit. i would have thought that the DBMS would have been the bottleneck, not the HTTP server. how can a DBMS have more through put than a web server? David Farber

The main bottleneck is memory. If you can keep everything in memory, then it's obviously a win, but with a high-volume system, any per-request state that sticks around between calls can quickly swamp the system. This is particularly true in a web app where people can just quit a browser (or crash, or go for lunch). How long do you keep that state in memory? Shared memory is particularly expensive. -- Alan Knight

Philip Greenspun has written a wonderful book on how to build web servers. He explains why you should have a database at the center. I was convinced. I highly recommend the book to anybody who wants to think seriously about web servers. Which should include people who want to build them. I absolutely agree. I've just been reading this book, and it's both highly educational and inspiring that people can do well out of good technology even if it goes against the prevailing hype (He also runs a company which is currently about 70 people) -- Alan Knight

A database CAN be the bottleneck. I can imagine building a special in-memory database for people with small amounts (i.e. a few hundred megabytes) of memory and high throughput requirements. But a lot of applications will be better off with a traditional database. In any case, all of them need a database of some kind. -- Ralph Johnson

Assuming we have a good Smalltalk interface with the web server, we can implement persistence in a variety of ways in Smalltalk, whether to an RDBMS directly or via object mapping, or OODBMS, or in memory structures. Or am I missing something? Jeff Odell

I'd like to join this project. Bjorn Freeman-Benson But now I've been sick and can't come. Sigh.

You are in, as I am too... I read the thread reading 'Squeak' where it says 'Smalltalk'. Up there David said that he wants to write something like mod_perl, and he said that apache forks a child for each request. I don't think that letting Apache fork a Squeak VM for each request is a good idea. We need some VM sharing capability. I'm not sure if using sockets is a good idea (Squeak's Sockets are not my favorite), but something must be done in this line. Provably the best solution is to reimplement Squeak's Sockets (if it's needed) and use them? I'm not sure Gerardo Richarte

Any kind of per-request forking would be bad. That's why people don't like CGI for large systems. Apache is a pre-forking system, that is to say that it forks a pool of processes, then delegates requests to them. Within those processes, there's (typically) no threading. One would presumably not want to fork an ST VM for each of those. One might fairly easily, though, have those apache processes just communicate with one or a few ST VM's. -- Alan Knight

"A bunch of us have been discussing hacking Apache so that it can talk to a Smalltalk server through TCP/IP " Do we really need such cooperation between appache and ST ?
1, - What are the main problems in ST web server code for this decision.
2, - Want we move to non-standard way depending in future on both system evolutuion and changes ?
3, - If we can analyze problems in ST code or socket primitives we are out of this problems...

My personal opinion is that :
1, Main problem is in request time delay and speed of serving objects and large data from ST. ( solution will need developing and movin into primitives some small code which will serve main web pages and will be user configurable just in some bugs free opinions )
2, If you will mix up 2 fine coctails, where is approving that solution will be better than any of them ?
3, You can throwe away any kind of sorrows by analysing them.
--- Jan Barger

If Object Oriented programming has anything to do with object reuse, one of the things we should deliberately reuse is web servers. It doesn't really matter whether or not there's anything wrong with a Smalltalk web server--except that competing as a web server technology is not (I think) Smalltalk's core competency. I expect it would be much easier to include Smalltalk as a VM behind an exsiting, supported web server than it would be to suggest to add a new web server. Who would maintain it? How would it be configured? Would it support everything Apache's httpd.conf supports? Will it do proxying? Should it do proxying? None of these are questions Smalltalkers wanting to put Smalltlak behind a web server should have to answer. Use what's there, and put Smalltalk behind it to do the heavy lifting.

-- Thomas Gagne

Hmm, Thomas ... imagine you are a new car dealer. And you want to make introduction to super new ST OOP car and compare its ability with all other already selling cars you have. You know that the ST OOP is just a newcomer, with all these bugs and mistakes that engineers dont have find and fix jet. But it has a super engine and also your feeling is telling you that this car and maybe its next fixed model will be perfect and you want to show only the advantages and hide the bugs ...

Can you start your show by using some other old truck and put the new car only on the top of it. Withouth any possibility to turn on engine and leave other cars 1 mile behind ?

Well, you know that it is dangerous and some parts are not jet tested for such a speed and power ... but maybe ...

If you decided to put new car on the old truck, it is like using MS-DOS shell. With its perfect ability to promote in command line some windows advantages or like thinking "C" way in Smalltalk environment.

--- Jan Barger

No one has mentioned fastcgi. While fastcgi hasn't fared well over the years it might be useful for this project. There is an Apache fastcgi module so all you'd have to do is implement the fastcgi protocol on the smalltalk side; you could use the C based fastcgi libs to do this.

As for the in/out of process question. The main advantage of having a VM in each Apache process is that you can have smalltalk hook directly into the Apache API. I use this frequently in mod_perl to: define custom authentication, custom logging, custom error handling, etc ... If you are out of process you can't do that.

As for the "why Apache?" question: if all you want to do is smalltalk, then you probably don't need Apache. However, if you plan on using exisint mod_perl, have lots of static content, need an SSL based proxy, etc ... then having Apache as the front end is nice.

--- Dave Tauzell

I am looking for a way to expose Smalltalk (VW2.51) functions to the web. Based on what I have read, FastCGI/Apache seems like a reasonable approach. I love smalltalk, but have no time to re-invent the wheel. Is anyone else interested in discussing this? Perhaps collaboration on a Smalltalk FastCGI client? Time is of the essence for me, so email ASAP if you are at all interested. campsmalltalk@dougphelan.net

--- Doug Phelan

I think people use Apache too much just because it's there. A web server is not a complex program, really. HTTP is designed to be easy to parse and easy to implement, and it is. And it's also very frangible, you can just implement a bit of a webserver if that's all the functionality you need. Look at CGI, for example... I had a quick program I needed to provide web access to, it was about 20 lines of UNIX shell scripts. It was no more work to read and parse a GET request and spit out the information needed than it would have been to read and parse CGI input, so I did it and ran it from inetd.

Yes, there's all sorts of complexity you need to add on top of that, but if you're going to be doing things like maintaining persistent state in Smalltalk you're going to need to have a lot of the code to manage cookies and things duplicated anyway.

So for this project.. would it take more effort to write a Smalltalk interface to HTTP than FastCGI?

--- Peter da Silva

I am now dealing with this issue (again..). I wrote a little C program (for win32) to allow VisualWorks 2.5.2 to be used as cgi. cgiToIp.C is called by the web server, reads info on stdin, connects to smalltalk on socket. Smalltalk reads request from socket, writes result back to socket, cgiToIp.C sends result to stdout and quits.

No conversations between client and smalltalk (only request & answer), but it meets our need.

campsmalltalk@dougphelan.net

--- Doug Phelan