Showing posts with label computer science. Show all posts
Showing posts with label computer science. Show all posts

Wednesday, March 11, 2009

Basic Computational Complexity

It normally seems that around this time of the spring semester, my adviser, John Hitchcock, has to attend a conference or a meeting of sorts that takes him away from Laramie for a week or two, and I have the privilege of lecturing in his place.

I would like to do more teaching, but as our department has a hard enough time keeping all but the core classes open, and there aren't very many assistantships (due to our department nearly folding last school year), I don't have the opportunity. I might have the chance at a community college after Sara and I graduate--a community college since that will probably be the closest institute of higher learning in the area--but that's for the future.

Anyway, the class, COSC 4200: Computation and Complexity, covers some basic computation theory in the first half of the semester, focusing on regular languages and the associated finite automata, and context-free languages and their associated grammars and push-down automata; and then delving into Turing Machines and some complexity theory in the second half of the semester.

Since these theory topics are within my area of study, I'm always fascinated by them. Personally, I find playing with DFAs and PDAs quite enjoyable. I tried to describe them as the tinker toys of computational theory to my wife, but sadly she felt that meant that talking about them was a waste of time. Sigh.

I personally find it interesting how some very simple paradigms of computation can actually accomplish a fair amount of work. It is also interesting noting the closure properties of such languages. Regular languages, for example, are closed under union, intersection, complement, concatenation. Context-free languages are closed under union and concatenation, but not complement or intersection. More importantly, under regular languages, nondeterminism is not any more powerful than determinism, one of the few cases where we know this happens (one other being PSPACE=NPSPACE, due to Savitch's Theorem). Of course, we can argue that we can only make this assertion since we restrict ourselves to a finite number of states with no memory devices (like a stack or a queue), and that there is still an exponential blowup of states when converting from NFAs to DFAs. This exponential blowup is the reason why it seems that P is strictly contained in NP.

On the other hand, context-free languages are essentially defined nondeterministically, and CFLs are strictly more powerful than DCFLs, a case where separation between determinism and nondeterminism is known.

One of the neat things about playing with DFAs is that they are memoryless. They consume the input as they work, and what memory they have is a tiny, finite amount. For example, we can craft DFAs to remember the last c bits it has read, where c is some constant. Useful, but very limited, considering that most problems require us to have unbounded amounts of memory to solve them.

This limited factor of DFAs makes moving up to PDAs more exciting (at least for me), because PDAs are essentially NFAs with a stack. A stack operates on a LIFO (last in, first out) property, but has infinite capacity. This means we can remember what we've read--only when we try to re-read the input, we read it backwards.

One of the considerations I wondered about when I first learned about these was what would happen if we used a queue, instead of a stack. The answer is: we have the equivalent of a Turing Machine. So having a queue gives us all the computational capabilities (minus a speed factor) of the PCs we have in front of us.

I was rather pleased when the class thought to ask about that during lecture yesterday.

Thursday, February 19, 2009

Skynet is Coming!

We have word that government officials are discussing the potential disaster of robots used in the military of turning on their makers, Terminator-style.

To this, I can only roll my eyes. People out there have way too much imagination on this kind of thing. Let me ask a question:

If I presented you with a robot that I had programmed, and I said I knew so little about how its programming would manifest, would you buy that robot from me?

I'm sorry, but these things do what we program them to do. They can only operated within the parameters of their codes. If they "turn" on their makers, it is because someone other botched his share of the programming. There will be no sentient maliciousness towards any robot "attacks" on their creators. Accidents, maybe; Terminator-style scenarios? Not in our dreams.

Friday, October 10, 2008

What is it that you do here?

One of the problems of being a theoretical computer scientist is that it is difficult to justify to the lay person just what it is that we do. In a flippant moment, I just say, "I stare off into space and try to think of theorems to prove." It is easier than the following: "I examine the relations between classes defined by reductions to sparse sets, because the ramifications of those sets having measure zero have implications for the relation between P and NP." The reason it is easier is because in this second explanation, I have to explain classes and reductions and what it means for a a set (i.e. a language) to be sparse, and then I need to explain P and NP why the question as to whether they are equal is important. And then I need to explain why sparse sets are important in the relation between P and NP, which involves not only those classes themselves, but also Boolean circuit classes like P/poly, and the ramifications into derandomization... It is a mess to explain, and most people don't have the hours it takes just to understand the basic principles at play.

Still, there's an answer I wish I could give, but I can't since it isn't my research. I could simply say: "Theoretical computer science tells us that it is impossible to have a 100% reliable virus detector." The explanation of that is probably a little too complex for the lay person, but here it goes.

There's a problem in computer science, a language (a set of strings) defined as follows:

HALT = { < m,x > | Program M halts on input x}

This is the notorious halting problem, which no computer program can solve. Why? Well, suppose such a program exists, and call it N. We can encode N as a string, as we can do with all programs (how do you think they're stored on your hard drive, after all?). Then we can make a new program P:

P: Input x. If N halts on x, run forever. Else, halt.

Now, in similar fashion, we can encode P as a string (which, of course, depends on being able to encode N as a string). Furthermore, we can feed P itself as an input! But then, what do we have? Does P on input P halt? Well, if it does, then P should run forever (i.e. doesn't halt), and if it doesn't halt, it does. Obviously this is a contradiction. Since this was based on assuming the halting problem has a program that decides it. So there must be no program for the halting problem.

Now let's turn to the problem of our foolproof virus checker. Current day virus checking programs look for signatures, i.e. particular patterns in the binary code of a suspicious program. But we know that isn't successful, because that requires knowing the signature ahead of time, and besides, suppose we had the signature somewhere in the code, but the program is built to never execute the signature. Then it might not be a virus at all. So we want our foolproof virus checker to be capable of examining a program and telling us decisively if it is a virus (i.e. the program is capable of replicating itself, and perhaps does damage in the process).

But here's the thing. Having a foolproof virus checker would give us a program to solve the halting problem, which we have already shown cannot have a program. Here's the proof:

Suppose we know that Program A is a virus, i.e. if it runs, it will replicate itself and perhaps do some damage to the computer. We can pad A with useless instructions that do not affect how A functions (save maybe that A will run a little slower). This is one method virus-writers use to defeat signature checking, by the way. So let's take an instance of the halting problem, some pair < m,x > . We want to know if M halts on input x. Well, we create Program B by "padding" A with the code of running M on x. In other words,

B: Run M on x. Then run Program A.

In other words, if M halts on x, B has the same properties of A. Since A is a virus, B must also be a virus. However, if M doesn't halt on x, B never executes A, and thus can't be a virus, since it doesn't replicate itself or damage the computer (besides running forever, of course).

Here's the catch: our foolproof virus-checker can then decide the halting problem. It examines B, and if B is a virus, then M must halt on x. If B isn't a virus, then M must run forever on x. But, as we said before, no program can decide the halting problem, so our foolproof virus-checker is a sham. It can't exist.

I think this statement in justification of theoretical computer science is one that would help explain our field's importance in once sense. But then, it might also turn people off to the field, if all we can do is assure the public that no computer is 100% safe from viruses!