On January 07, 2001, Elrac said:
Bit fields are a natural fit for representing sets. Sets are
collections of things in which some element is either present,
exactly once, or not at all. Bit fields are great for encoding
things like True/False questionnaires.
My basic scenario here is from one a spy movie where an
average Joe is recruited from the population of America
because the CIA needs a person with a very specific set of
talents for a mission, which involves impersonating a person
with just these talents, or something similarly far-fetched.
Of course the Census has gathered information on the skills of
everyone in the nation for just such a purpose.
You'd have (say) 8 questions like:
- Can you handle hyperbolic trigonometry?
- Are you able to weave baskets underwater?
- Do you practice Applied Magic?
- Do you understand the Special Theory of
Relativity?
- Can you ride a horse bareback?
- Do you speak fluent Esperanto?
- Can you change a light bulb with your hands tied
behind your back?
- Have you navigated a nuclear submarine before?
A set of answers would look like this: Y, Y, Y, N, N, Y,
Y, N.
If you wanted to store the answers in a database (for whatever
purpose), you could set up 8 fields of 1 character. That's
very straightforward but a bit wasteful. If you use single bit
fields, you wouldn't even be wasting too much space, and you
could query the answers flexibly. But for our spy mission, we
aren't interested in anything except a perfect match on some
subset of these 8 skills. And because the fate of the nation
is in the balance, the query must be very quick. Ideally, a
match should be determined by a simple and quick operation on
a single field.
Take our answer set: Y, Y, Y, N, N, Y, Y, N. The
answers are already in binary, so it would make sense to
encode them in binary, as: 11100110. But of course a bit
sequence like this looks a lot like an integer in binary. If
you punch 11100110 into a calculator that "does"
bases (Microsoft's Calculator applet does, in Scientific
mode), you can convert it to 230 decimal.
Converting binary to decimal is pretty easy: Just assign a
power of 2 to each bit. If I want things to work as they do in
the Calculator, I have to assign 128 to the first question, 64
to the second, 32 to the 3rd and so on to 1 for the last one.
Here's a table:
| Q# |
Value |
| 1 |
128 |
| 2 |
64 |
| 3 |
32 |
| 4 |
16 |
| 5 |
8 |
| 6 |
4 |
| 7 |
2 |
| 8 |
1 |
Time to step back and consider why one would want to DO such
gyrations. Let me remind you of my premise/promise that a
number of conditions could be tested with a single quick
operation. Oops... Mission Control is calling. They need a
bareback horse rider and nuclear submarine pilot who can
change light bulbs with his hands tied behind his back. Let's
see: That's questions 5, 7 and 8, or binary 00001011. The
decimal representation is found by simply adding the decimal
values of the bits, or 8 + 2 + 1 = 11. Now... let's write down
the answer set from above, and our requirements set from just
now, below each other:
11100110
00001011 In order for someone to qualify, he
needs to have 1's at least in those places where the
requirement has 1's. This is where the bitwise AND comes in.
Many language have this operator because it's easy for a CPU
to compute and it's often handy for stuff like this. SQL
Server uses a '&', C uses '&&.' The bitwise AND
returns a result where there are 1's in all those places where
there are 1's in the same column in both operands. Let's work
this out for our example:
11100110
& 00001011
----------
= 00000010 OK, our result has a single bit
set, it's the match on the light bulb changing. But there's no
match on the other two qualities, so the result is mostly 0's.
In decimal, this trial comes out as: 230 & 11 = 2. Enter
Albert Schwarzenegger, who answers 'YES' to all the questions:
11111111
& 00001011
----------
= 00001011 In decimal: 230 & 11 = 11.
So... even the most qualified applicant never gets more bits
set than there are in the requirement. After all, we don't
care about the other bits. What this shows us is how to
formulate the query for the requirement: In order for an
applicant to meet ALL the requirements, (applicant &
requirement = requirement).
This shows us what bit vectors are best at: Quick selections
on a bunch of yes/no criteria, where ALL the required 'Yes'
criteria need to be met for a match. A simple test made on a
number stored in a single field tells us if this criterion is
met or not.
This principle can be extended to test for 'no's by using
bitwise OR ('|') on the operands, or inverting some or all of
the bits of the operands using the Exclusive OR ('^')
operator. Softer criteria can be obtained by combining several
such queries with logical OR... but there comes a point where
the queries are no longer more efficient than queries on
other, more common forms of data representation. But for a
reasonable number of Yes/No criteria (remember, an integer
will only hold 32 bits!) or criteria with a small number of
possible values (which again can be coded into a few yes/no
bits), this is the most efficient method of storage and
querying.
|