Friday, September 28, 2012

How to bypass web filters


There are several occasions where you will be at a public to bypass web filters require access to a particular website that is blocked for some reason or another. How to bypass these restrictions is a very common question, and will be covered here.

Lets pretend for a moment that the Internet is made up of 26 websites, A-Z. The web filter blocks your browser from accessing sites X-Z, but not sites A-W. Simply make the browser think youre going to A-
W. There are a variety of ways to do this:


Proxy Servers: 
This is a list of http proxies. These sites may not be up forever, so you may need to search for free http proxy or public proxy servers or other similar terms.

Proxy server lists:



Now that you have a list of proxies, you would open IE (internet explorer) and click on Tools > Internet Options > Connections > LAN Settings > Advanced. Enter the address and port of one of the servers from the list in the proper area (http) and make sure the “use a proxy server for your LAN option is selected. Remember to replace the proxy and port at your terminal to the original when you're done.

*Note: Some proxies listed may not work, and this method may decrease your surfing speed. By trying various entries, you’ll find one that works, or works faster.

The infamous translation trick:
Go to a web page translation site and use their services to “translate a page to English thus accessing the blocked page through their trusted site.

You'll notice that several translation sites are blocked, but by using less popular ones, this method can still be effective. Here is a list of some translation services. Again, these sites may not be up forever, so you may need to search for them.




Url Scripting:


Url scripting is the easiest method. It works on a select few web filters and is based on the same principal as the translation trick. By typing and address like “www.yahoo.com@www.restricted_site.com the filter will not go into effect as it recognizes the trusted site (in this case yahoo.com)

Other tricks:
Simply open the command prompt and type:
Ping restricted.com ? restricted.com obviously being the restricted site
At this point you can take down the IP address (ex. 216.109.124.73) and enter it into the browser. If access to the command prompt is also restricted, see “How to bypass restrictions to get to the command prompt.” If this article has been taken from information leak, then know that it involves anything from opening the browser, selecting view > source, then saving it as X.bat and opening it to opening a folder or browser and typing in the location of cmd.exe depending on the OS. I will not go into further, as this a completely different topic.

Use https://restrictedsite.com as referring to it as a secured site may confuse the filter.

Note: These are ancient methods that many new filters defend against, but still may be applicable in your situation. If not, a little history never hurt anyone.

Web based Proxies:
Another one of the easier, yet effective methods include web based proxies. These are simple in the fact that you just enter the restricted address and surf! Some of these have some restrictions, like daily usage limits, etc but you can also use another proxy (perhaps one that sucks, like a text only) to bypass their restrictions as well. Here is a list of some:




Proxy Programs:
There are many proxy programs that allow you to surf anonymously that are more or less based on the same topics we’ve covered here. I’ve added them just to cover the topic thoroughly:



Making your own CGI proxy server:

Making your own proxy server may come in handy, but I personally find that simply uploading a txt file/w a list of proxies to a free host makes for a much easier and headache free solution. If you don't know PERL, there is code out there to help you set it up. Check out these sites for more info:

http://httpbridge.sourceforge.net
http://www.jmarshall.com/tools/cgiproxy]http://www.jmarshall.com/tools/cgiproxy
http://www.manageability.org/blog/stuff/open-source-personal-proxy-servers-written-in-java/view]http://www.manageability.org/blog/stuff/op...en-in-java/view



Admin Access:
When all else fails, you can simply take over the PC and alter or delete the damn filter. This method varies according to the OS (operating system) you are dealing with. Please see Hacking Windows NT for more information. If this tutorial has been taken from information leak, then I will go as far as to say it involves booting the PC in another OS, copying the SAM file and cracking it using a program like saminside or LC5 rather than start a whole new topic within one.



INTRODUCTION TO DENIAL OF SERVICE



.0. FOREWORD

.A. INTRODUCTION
.A.1. WHAT IS A DENIAL OF SERVICE ATTACK?
.A.2. WHY WOULD SOMEONE CRASH A SYSTEM?
.A.2.1. INTRODUCTION
.A.2.2. SUB-CULTURAL STATUS
.A.2.3. TO GAIN ACCESS
.A.2.4. REVENGE
.A.2.5. POLITICAL REASONS
.A.2.6. ECONOMICAL REASONS
.A.2.7. NASTINESS
.A.3. ARE SOME OPERATING SYSTEMS MORE SECURE?

.B. SOME BASIC TARGETS FOR AN ATTACK
.B.1. SWAP SPACE
.B.2. BANDWIDTH
.B.3. KERNEL TABLES
.B.4. RAM
.B.5. DISKS
.B.6. CACHES
.B.7. INETD

.C. ATTACKING FROM THE OUTSIDE
.C.1. TAKING ADVANTAGE OF FINGER
.C.2. UDP AND SUNOS 4.1.3.
.C.3. FREEZING UP X-WINDOWS
.C.4. MALICIOUS USE OF UDP SERVICES
    .C.5. ATTACKING WITH LYNX CLIENTS
.C.6. MALICIOUS USE OF telnet
.C.7. MALICIOUS USE OF telnet UNDER SOLARIS 2.4
.C.8. HOW TO DISABLE ACCOUNTS
.C.9. LINUX AND TCP TIME, DAYTIME
.C.10. HOW TO DISABLE SERVICES
.C.11. PARAGON OS BETA R1.4
.C.12. NOVELLS NETWARE FTP
.C.13. ICMP REDIRECT ATTACKS
.C.14. BROADCAST STORMS
.C.15. EMAIL BOMBING AND SPAMMING
.C.16. TIME AND KERBEROS
.C.17. THE DOT DOT BUG
.C.18. SUNOS KERNEL PANIC
.C.19. HOSTILE APPLETS
.C.20. VIRUS
.C.21. ANONYMOUS FTP ABUSE
.C.22. SYN FLOODING
.C.23. PING FLOODING
.C.24. CRASHING SYSTEMS WITH PING FROM WINDOWS 95 MACHINES
.C.25. MALICIOUS USE OF SUBNET MASK REPLY MESSAGE
.C.26. FLEXlm
.C.27. BOOTING WITH TRIVIAL FTP

.D. ATTACKING FROM THE INSIDE
.D.1. KERNEL PANIC UNDER SOLARIS 2.3
.D.2. CRASHING THE X-SERVER
.D.3. FILLING UP THE HARD DISK
.D.4. MALICIOUS USE OF eval
.D.5. MALICIOUS USE OF fork()
.D.6. CREATING FILES THAT IS HARD TO REMOVE
.D.7. DIRECTORY NAME LOOKUPCACHE
.D.8. CSH ATTACK
.D.9. CREATING FILES IN /tmp
.D.10. USING RESOLV_HOST_CONF
.D.11. SUN 4.X AND BACKGROUND JOBS
.D.12. CRASHING DG/UX WITH ULIMIT 
.D.13. NETTUNE AND HP-UX
.D.14. SOLARIS 2.X AND NFS
.D.15. SYSTEM STABILITY COMPROMISE VIA MOUNT_UNION
.D.16. trap_mon CAUSES KERNEL PANIC UNDER SUNOS 4.1.X

.E. DUMPING CORE
.E.1. SHORT COMMENT
.E.2. MALICIOUS USE OF NETSCAPE
.E.3. CORE DUMPED UNDER WUFTPD
.E.4. ld UNDER SOLARIS/X86

.F. HOW DO I PROTECT A SYSTEM AGAINST DENIAL OF SERVICE ATTACKS?
.F.1. BASIC SECURITY PROTECTION
.F.1.1. INTRODUCTION
.F.1.2. PORT SCANNING
.F.1.3. CHECK THE OUTSIDE ATTACKS DESCRIBED IN THIS PAPER
.F.1.4. CHECK THE INSIDE ATTACKS DESCRIBED IN THIS PAPER
.F.1.5. EXTRA SECURITY SYSTEMS
.F.1.6. MONITORING SECURITY
.F.1.7. KEEPING UP TO DATE
.F.2. MONITORING PERFORMANCE
.F.2.1. INTRODUCTION
.F.2.2. COMMANDS AND SERVICES                      
.F.2.3. PROGRAMS
.F.2.4. ACCOUNTING


.0. FOREWORD
------------

This blog is about

- What is a denial of service attack?
- Why would someone crash a system?
- How can someone crash a system.
- How do I protect a system against denial of service attacks?


.A. INTRODUCTION
~~~~~~~~~~~~~~~~

.A.1. WHAT IS A DENIAL OF SERVICE ATTACK?
-----------------------------------------

Denial of service is about without permission knocking off
services, for example through crashing the whole system. This
kind of attacks are easy to launch and it is hard to protect
a system against them. The basic problem is that Unix
assumes that users on the system or on other systems will be
well behaved.

.A.2. WHY WOULD SOMEONE CRASH A SYSTEM?
---------------------------------------

.A.2.1. INTRODUCTION
--------------------

Why would someone crash a system? I can think of several reasons
that I have presentated more precisely in a section for each reason,
but for short:

.1. Sub-cultural status.
.2. To gain access.
.3. Revenge.
.4. Political reasons.
.5. Economical reasons.
.6. Nastiness.

I think that number one and six are the more common today, but that
number four and five will be the more common ones in the future.

.A.2.2. SUB-CULTURAL STATUS
---------------------------

After all information about syn flooding a bunch of such attacks
were launched around Sweden. The very most of these attacks were
not a part of a IP-spoof attack, it was "only" a denial of service
attack. Why?

I think that hackers attack systems as a sub-cultural pseudo career
and I think that many denial of service attacks, and here in the
example syn flooding, were performed for these reasons. I also think
that many hackers begin their carrer with denial of service attacks.

.A.2.3. TO GAIN ACCESS
----------------------

Sometimes could a denial of service attack be a part of an attack to
gain access at a system. At the moment I can think of these reasons
and specific holes:

.1. Some older X-lock versions could be crashed with a
method from the denial of service family leaving the system
open. Physical access was needed to use the work space after.

.2. Syn flooding could be a part of a IP-spoof attack method.

.3. Some program systems could have holes under the startup,
that could be used to gain root, for example SSH (secure shell).

.4. Under an attack it could be usable to crash other machines
in the network or to deny certain persons the ability to access
the system.  

.5. Also could a system being booted sometimes be subverted,
especially rarp-boots. If we know which port the machine listen
to (69 could be a good guess) under the boot we can send false
packets to it and almost totally control the boot.

.A.2.4. REVENGE
---------------

A denial of service attack could be a part of a revenge against a user
or an administrator.

.A.2.5. POLITICAL REASONS
-------------------------

Sooner or later will new or old organizations understand the potential
of destroying computer systems and find tools to do it.

For example imaginate the Bank A loaning company B money to build a
factory threating the environment. The organization C therefor crash A:s
computer system, maybe with help from an employee. The attack could cost
A a great deal of money if the timing is right.

.A.2.6. ECONOMICAL REASONS
--------------------------

Imaginate the small company A moving into a business totally dominated by
company B. A and B customers make the orders by computers and depends
heavily on that the order is done in a specific time (A and B could be
stock trading companies). If A and B can't perform the order the customers
lose money and change company.

As a part of a business strategy A pays a computer expert a sum of money to
get him to crash B:s computer systems a number of times. A year later A
is the dominating company.

.A.2.7. NASTINESS
-----------------

I know a person that found a workstation where the user had forgotten to
logout. He sat down and wrote a program that made a kill -9 -1 at a
random time at least 30 minutes after the login time and placed a call to
the program from the profile file. That is nastiness.

.A.3. ARE SOME OPERATING SYSTEMS MORE SECURE?
---------------------------------------------

This is a hard question to answer and I don't think that it will
give anything to compare different Unix platforms. You can't say that
one Unix is more secure against denial of service, it is all up to the
administrator.

A comparison between Windows 95 and NT on one side and Unix on the
other could however be interesting.

Unix systems are much more complex and have hundreds of built in programs,
services... This always open up many ways to crash the system from
the inside.

In the normal Windows NT and 95 network were is few ways to crash
the system. Although were is methods that always will work.

That gives us that no big different between Microsoft and Unix can
be seen regardning the inside attacks. But there is a couple of
points left:

- Unix have much more tools and programs to discover an
attack and monitoring the users. To watch what another user
is up to under windows is very hard.

- The average Unix administrator probably also have much more
experience than the average Microsoft administrator.

The two last points gives that Unix is more secure against inside
denial of service attacks.

A comparison between Microsoft and Unix regarding outside attacks
are much more difficult. However I would like to say that the average
Microsoft system on the Internet are more secure against outside
attacks, because they normally have much less services.

.B. SOME BASIC TARGETS FOR AN ATTACK
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.B.1. SWAP SPACE
----------------

Most systems have several hundred Mbytes of swap space to
service client requests. The swap space is typical used
for forked child processes which have a short life time.
The swap space will therefore almost never in a normal
cause be used heavily. A denial of service could be based
on a method that tries to fill up the swap space.

.B.2. BANDWIDTH
---------------

If the bandwidth is to high the network will be useless. Most
denial of service attack influence the bandwidth in some way.

.B.3. KERNEL TABLES
-------------------

It is trivial to overflow the kernel tables which will cause
serious problems on the system. Systems with write through
caches and small write buffers is especially sensitive.

Kernel memory allocation is also a target that is sensitive.
The kernel have a kernelmap limit, if the system reach this
limit it can not allocate more kernel memory and must be rebooted.
The kernel memory is not only used for RAM, CPU:s, screens and so
on, it it also used for ordinaries processes. Meaning that any system
can be crashed and with a mean (or in some sense good) algorithm pretty
fast.

For Solaris 2.X it is measured and reported with the sar command
how much kernel memory the system is using, but for SunOS 4.X there
is no such command. Meaning that under SunOS 4.X you don't even can
get a warning. If you do use Solaris you should write sar -k 1 to
get the information. netstat -k can also be used and shows how much
memory the kernel have allocated in the subpaging.

.B.4. RAM
---------

A denial of service attack that allocates a large amount of RAM
can make a great deal of problems. NFS and mail servers are
actually extremely sensitive because they do not need much
RAM and therefore often don't have much RAM. An attack at
a NFS server is trivial. The normal NFS client will do a
great deal of caching, but a NFS client can be anything
including the program you wrote yourself...

.B.5. DISKS
-----------

A classic attack is to fill up the hard disk, but an attack at
the disks can be so much more. For example can an overloaded disk
be misused in many ways.

.B.6. CACHES
-------------

A denial of service attack involving caches can be based on a method
to block the cache or to avoid the cache.

These caches are found on Solaris 2.X:

Directory name lookup cache: Associates the name of a file with a vnode.

Inode cache: Cache information read from disk in case it is needed
again.

Rnode cache: Holds information about the NFS filesystem.

Buffer cache: Cache inode indirect blocks and cylinders to realed disk
I/O.

.B.7. INETD
-----------

Well once inetd crashed all other services running through inetd no
longer will work.


.C. ATTACKING FROM THE OUTSIDE
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


.C.1. TAKING ADVANTAGE OF FINGER
--------------------------------

Most fingerd installations support redirections to an other host.

Ex:

$finger @system.two.com@system.one.com

finger will in the example go through system.one.com and on to
system.two.com. As far as system.two.com knows it is system.one.com
who is fingering. So this method can be used for hiding, but also
for a very dirty denial of service attack. Lock at this:

$ finger @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@host.we.attack

All those @ signs will get finger to finger host.we.attack again and
again and again... The effect on host.we.attack is powerful and
the result is high bandwidth, short free memory and a hard disk with
less free space, due to all child processes (compare with .D.5.).

The solution is to install a fingerd which don't support redirections,
for example GNU finger. You could also turn the finger service off,
but I think that is just a bit to much.

.C.2. UDP AND SUNOS 4.1.3.
--------------------------

SunOS 4.1.3. is known to boot if a packet with incorrect information
in the header is sent to it. This is the cause if the ip_options
indicate a wrong size of the packet.

The solution is to install the proper patch.

.C.3. FREEZING UP X-WINDOWS
---------------------------

If a host accepts a telnet session to the X-Windows port (generally
somewhere between 6000 and 6025. In most cases 6000) could that
be used to freeze up the X-Windows system. This can be made with
multiple telnet connections to the port or with a program which
sends multiple XOpenDisplay() to the port.

The same thing can happen to Motif or Open Windows.

The solution is to deny connections to the X-Windows port.

.C.4. MALICIOUS USE OF UDP SERVICES
-----------------------------------

It is simple to get UDP services (echo, time, daytime, chargen) to
loop, due to trivial IP-spoofing. The effect can be high bandwidth
that causes the network to become useless. In the example the header
claim that the packet came from 127.0.0.1 (loopback) and the target
is the echo port at system.we.attack. As far as system.we.attack knows
is 127.0.0.1 system.we.attack and the loop has been establish.

Ex:

from-IP=127.0.0.1
to-IP=system.we.attack
Packet type:UDP
from UDP port 7
to UDP port 7

Note that the name system.we.attack looks like a DNS-name, but the
target should always be represented by the IP-number.

Quoted from proberts@clark.net (Paul D. Robertson) comment on
comp.security.firewalls on matter of "Introduction to denial of service"

" A great deal of systems don't put loopback on the wire, and simply
emulate it.  Therefore, this attack will only effect that machine
in some cases.  It's much better to use the address of a different
machine on the same network.  Again, the default services should
be disabled in inetd.conf.  Other than some hacks for mainframe IP
stacks that don't support ICMP, the echo service isn't used by many
legitimate programs, and TCP echo should be used instead of UDP
where it is necessary. "

.C.5. ATTACKING WITH LYNX CLIENTS
---------------------------------

A World Wide Web server will fork an httpd process as a respond
to a request from a client, typical Netscape or Mosaic. The process
lasts for less than one second and the load will therefore never
show up if someone uses ps. In most causes it is therefore very
safe to launch a denial of service attack that makes use of
multiple W3 clients, typical lynx clients. But note that the netstat
command could be used to detect the attack (thanks to Paul D. Robertson).

Some httpd:s (for example http-gw) will have problems besides the normal
high bandwidth, low memory... And the attack can in those causes get
the server to loop (compare with .C.6.)

.C.6. MALICIOUS USE OF telnet
-----------------------------

Study this little script:

Ex:

while : ; do
telnet system.we.attack &
done

An attack using this script might eat some bandwidth, but it is
nothing compared to the finger method or most other methods. Well
the point is that some pretty common firewalls and httpd:s thinks
that the attack is a loop and turn them self down, until the
administrator sends kill -HUP.

This is a simple high risk vulnerability that should be checked
and if present fixed.

.C.7. MALICIOUS USE OF telnet UNDER SOLARIS 2.4
-----------------------------------------------

If the attacker makes a telnet connections to the Solaris 2.4 host and
quits using:

Ex:

Control-}
quit

then will inetd keep going "forever". Well a couple of hundred...

The solution is to install the proper patch.

.C.8. HOW TO DISABLE ACCOUNTS
-----------------------------

Some systems disable an account after N number of bad logins, or waits
N seconds. You can use this feature to lock out specific users from
the system.

.C.9. LINUX AND TCP TIME, DAYTIME
----------------------------------

Inetd under Linux is known to crash if to many SYN packets sends to
daytime (port 13) and/or time (port 37).

The solution is to install the proper patch.

.C.10. HOW TO DISABLE SERVICES
------------------------------

Most Unix systems disable a service after N sessions have been
open in a given time. Well most systems have a reasonable default
(lets say 800 - 1000), but not some SunOS systems that have the
default set to 48...

The solutions is to set the number to something reasonable.

.C.11. PARAGON OS BETA R1.4
---------------------------

If someone redirects an ICMP (Internet Control Message Protocol) packet
to a paragon OS beta R1.4 will the machine freeze up and must be
rebooted. An ICMP redirect tells the system to override routing
tables. Routers use this to tell the host that it is sending
to the wrong router.

The solution is to install the proper patch.

.C.12. NOVELLS NETWARE FTP
--------------------------

Novells Netware FTP server is known to get short of memory if multiple
ftp sessions connects to it.

.C.13. ICMP REDIRECT ATTACKS
----------------------------

Gateways uses ICMP redirect to tell the system to override routing
tables, that is telling the system to take a better way. To be able
to misuse ICMP redirection we must know an existing connection
(well we could make one for ourself, but there is not much use for that).
If we have found a connection we can send a route that
loses it connectivity or we could send false messages to the host
if the connection we have found don't use cryptation.

Ex: (false messages to send)

DESTINATION UNREACHABLE 
TIME TO LIVE EXCEEDED
PARAMETER PROBLEM
PACKET TOO BIG

The effect of such messages is a reset of the connection.

The solution could be to turn ICMP redirects off, not much proper use
of the service.

.C.14. BROADCAST STORMS
-----------------------

This is a very popular method in networks there all of the hosts are
acting as gateways.

There are many versions of the attack, but the basic method is to
send a lot of packets to all hosts in the network with a destination
that don't exist. Each host will try to forward each packet so
the packets will bounce around for a long time. And if new packets
keep coming the network will soon be in trouble.

Services that can be misused as tools in this kind of attack is for
example ping, finger and sendmail. But most services can be misused
in some way or another.

.C.15. EMAIL BOMBING AND SPAMMING
---------------------------------

In a email bombing attack the attacker will repeatedly send identical
email messages to an address. The effect on the target is high bandwidth,
a hard disk with less space and so on... Email spamming is about sending
mail to all (or rather many) of the users of a system. The point of
using spamming instead of bombing is that some users will try to
send a replay and if the address is false will the mail bounce back. In
that cause have one mail transformed to three mails. The effect on the
bandwidth is obvious.

There is no way to prevent email bombing or spamming. However have
a look at CERT:s paper "Email bombing and spamming".

.C.16. TIME AND KERBEROS
------------------------

If not the the source and target machine is closely aligned will the
ticket be rejected, that means that if not the protocol that set the
time is protected it will be possible to set a kerberos server of
function.

.C.17. THE DOT DOT BUG
----------------------

Windows NT file sharing system is vulnerable to the under Windows 95
famous dot dot bug (dot dot like ..). Meaning that anyone can crash
the system. If someone sends a "DIR ..\" to the workstation will a
STOP messages appear on the screen on the Windows NT computer. Note that
it applies to version 3.50 and 3.51 for both workstation and server
version.

The solution is to install the proper patch.

.C.18. SUNOS KERNEL PANIC
-------------------------

Some SunOS systems (running TIS?) will get a kernel panic if a
getsockopt() is done after that a connection has been reset.

The solution could be to install Sun patch 100804.

.C.19. HOSTILE APPLETS
----------------------

A hostile applet is any applet that attempts to use your system
in an inappropriate manner. The problems in the java language
could be sorted in two main groups:

1) Problems due to bugs.
2) Problems due to features in the language.

In group one we have for example the java bytecode verifier bug, which
makes is possible for an applet to execute any command that the user
can execute. Meaning that all the attack methods described in .D.X.
could be executed through an applet. The java bytecode verifier bug
was discovered in late March 1996 and no patch have yet been available
(correct me if I'am wrong!!!).

Note that two other bugs could be found in group one, but they
are both fixed in Netscape 2.01 and JDK 1.0.1.

Group two are more interesting and one large problem found is the
fact that java can connect to the ports. Meaning that all the methods
described in .C.X. can be performed by an applet. More information
and examples could be found at address:

http://www.math.gatech.edu/~mladue/HostileArticle.html

If you need a high level of security you should use some sort of
firewall for protection against java. As a user you could have
java disable.

.C.20. VIRUS
------------

Computer virus is written for the purpose of spreading and
destroying systems. Virus is still the most common and famous
denial of service attack method.

It is a misunderstanding that virus writing is hard. If you know
assembly language and have source code for a couple of virus it
is easy. Several automatic toolkits for virus construction could
also be found, for example:

* Genvir.
* VCS (Virus Construction Set).
* VCL (Virus Construction Laboratory).
* PS-MPC (Phalcon/Skism - Mass Produced Code Generator).
* IVP (Instant Virus Production Kit).
* G2 (G Squared).

PS-MPC and VCL is known to be the best and can help the novice programmer
to learn how to write virus.

An automatic tool called MtE could also be found. MtE will transform
virus to a polymorphic virus. The polymorphic engine of MtE is well
known and should easily be catch by any scanner.

.C.21. ANONYMOUS FTP ABUSE
--------------------------

If an anonymous FTP archive have a writable area it could be misused
for a denial of service attack similar with with .D.3. That is we can
fill up the hard disk.

Also can a host get temporarily unusable by massive numbers of
FTP requests.

For more information on how to protect an anonymous FTP site could
CERT:s "Anonymous FTP Abuses" be a good start.

.C.22. SYN FLOODING
-------------------

Both 2600 and Phrack have posted information about the syn flooding attack.
2600 have also posted exploit code for the attack.

As we know the syn packet is used in the 3-way handshake. The syn flooding
attack is based on an incomplete handshake. That is the attacker host
will send a flood of syn packet but will not respond with an ACK packet.
The TCP/IP stack will wait a certain amount of time before dropping
the connection, a syn flooding attack will therefore keep the syn_received
connection queue of the target machine filled.

The syn flooding attack is very hot and it is easy to find more information
about it, for example:

[.1.] http://www.eecs.nwu.edu/~jmyers/bugtraq/1354.html
Article by Christopher Klaus, including a "solution".

[.2.] http://jya.com/floodd.txt
2600, Summer, 1996, pp. 6-11. FLOOD WARNING by Jason Fairlane

[.3.] http://www.fc.net/phrack/files/p48/p48-14.html
IP-spoofing Demystified by daemon9 / route / infinity
      for Phrack Magazine

.C.23. PING FLOODING
--------------------

I haven't tested how big the impact of a ping flooding attack is, but
it might be quite big.

Under Unix we could try something like: ping -s host
to send 64 bytes packets.

If you have Windows 95, click the start button, select RUN, then type
in: PING -T -L 256 xxx.xxx.xxx.xx. Start about 15 sessions.

.C.24. CRASHING SYSTEMS WITH PING FROM WINDOWS 95 MACHINES
----------------------------------------------------------

If someone can ping your machine from a Windows 95 machine he or she might
reboot or freeze your machine. The attacker simply writes:

ping -l 65510 address.to.the.machine

And the machine will freeze or reboot.

Works for kernel 2.0.7 up to version 2.0.20. and 2.1.1. for Linux (crash).
AIX4, OSF, HPUX 10.1, DUnix 4.0 (crash).
OSF/1, 3.2C, Solaris 2.4 x86 (reboot).

.C.25. MALICIOUS USE OF SUBNET MASK REPLY MESSAGE
--------------------------------------------------

The subnet mask reply message is used under the reboot, but some
hosts are known to accept the message any time without any check.
If so all communication to or from the host us turned off, it's dead.

The host should not accept the message any time but under the reboot.

.C.26. FLEXlm
-------------

Any host running FLEXlm can get the FLEXlm license manager daemon
on any network to shutdown using the FLEXlm lmdown command.

# lmdown -c /etc/licence.dat
lmdown - Copyright (C) 1989, 1991 Highland Software, Inc.

Shutting down FLEXlm on nodes: xxx
Are you sure? [y/n]: y
Shut down node xxx
#

.C.27. BOOTING WITH TRIVIAL FTP
-------------------------------

To boot diskless workstations one often use trivial ftp with rarp or
bootp. If not protected an attacker can use tftp to boot the host.


.D. ATTACKING FROM THE INSIDE
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.D.1. KERNEL PANIC UNDER SOLARIS 2.3
------------------------------------

Solaris 2.3 will get a kernel panic if this
is executed:

EX:

$ndd /dev/udp udp_status

The solution is to install the proper patch.

.D.2. CRASHING THE X-SERVER
---------------------------

If stickybit is not set in /tmp then can the file /tmp/.x11-unix/x0
be removed and the x-server will crash.

Ex:

$ rm /tmp/.x11-unix/x0

.D.3. FILLING UP THE HARD DISK
-----------------------------

If your hard disk space is not limited by a quota or if you can use
/tmp then it`s possible for you to fill up the file system.

Ex:

while : ;
mkdir .xxx
cd .xxx
done

.D.4. MALICIOUS USE OF eval
---------------------------

Some older systems will crash if eval '\!\!' is executed in the
C-shell.

Ex:

% eval '\!\!'

.D.5. MALICIOUS USE OF fork() 
-----------------------------

If someone executes this C++ program the result will result in a crash
on most systems.

Ex:

#include
#include
#include

main()
{
int x;
while(x=0;x<1000000 b="b" x="x">
{
system("uptime");
fork();
}
}

You can use any command you want, but uptime is nice
because it shows the workload.

To get a bigger and very ugly attack you should however replace uptime
(or fork them both) with sync. This is very bad.

If you are real mean you could also fork a child process for
every child process and we will get an exponential increase of
workload.

There is no good way to stop this attack and
similar attacks. A solution could be to place a limit
on time of execution and size of processes.

.D.6. CREATING FILES THAT IS HARD TO REMOVE
-------------------------------------------

Well all files can be removed, but here is some ideas:

Ex.I.

$ cat > -xxx
^C
$ ls
-xxx
$ rm -xxx
rm: illegal option -- x
rm: illegal option -- x
rm: illegal option -- x
usage: rm [-fiRr] file ...
$

Ex.II.

$ touch xxx!
$ rm xxx!
rm: remove xxx! (yes/no)? y
$ touch xxxxxxxxx!
$ rm xxxxxxxxx!
bash: !": event not found
$

(You see the size do count!)

Other well know methods is files with odd characters or spaces
in the name.

These methods could be used in combination with ".D.3 FILLING UP THE
HARDDISK". If you do want to remove these files you must use some sort
of script or a graphical interface like OpenWindow:s File
Manager. You can also try to use: rm ./. It should work for
the first example if you have a shell.

.D.7. DIRECTORY NAME LOOKUPCACHE
--------------------------------

Directory name lookupcache (DNLC) is used whenever a file is opened.
DNLC associates the name of the file to a vnode. But DNLC can only
operate on files with names that has less than N characters (for SunOS 4.x
up to 14 character, for Solaris 2.x up 30 characters). This means
that it's dead easy to launch a pretty discreet denial of service attack.

Create lets say 20 directories (for a start) and put 10 empty files in
every directory. Let every name have over 30 characters and execute a
script that makes a lot of ls -al on the directories.

If the impact is not big enough you should create more files or launch
more processes.

.D.8. CSH ATTACK
----------------

Just start this under /bin/csh (after proper modification)
and the load level will get very high (that is 100% of the cpu time)
in a very short time.

Ex:

|I /bin/csh
nodename : **************b

.D.9. CREATING FILES IN /tmp
----------------------------

Many programs creates files in /tmp, but are unable to deal with the problem
if the file already exist. In some cases this could be used for a
denial of service attack.

.D.10. USING RESOLV_HOST_CONF
-----------------------------

Some systems have a little security hole in the way they use the
RESOLV_HOST_CONF variable. That is we can put things in it and
through ping access confidential data like /etc/shadow or
crash the system. Most systems will crash if /proc/kcore is
read in the variable and access through ping.

Ex:

$ export RESOLV_HOST_CONF="/proc/kcore" ; ping asdf

.D.11. SUN 4.X AND BACKGROUND JOBS
----------------------------------

" Put the string "a&" in a file called "a" and perform "chmod +x a".
Running "a" will quickly disable a Sun 4.x machine, even disallowing
(counter to specs) root login as the kernel process table fills."

" The cute thing is the size of the
script, and how few keystrokes it takes to bring down a Sun
as a regular user."

.D.12. CRASHING DG/UX WITH ULIMIT 
---------------------------------

ulimit is used to set a limit on the system resources available to the
shell. If ulimit 0 is called before /etc/passwd, under DG/UX, will the
passwd file be set to zero.

.D.13. NETTUNE AND HP-UX
------------------------

/usr/contrib/bin/nettune is SETUID root on HP-UX meaning
that any user can reset all ICMP, IP and TCP kernel
parameters, for example the following parameters:

- arp_killcomplete 
- arp_killincomplete
- arp_unicast 
- arp_rebroadcast
- icmp_mask_agent
- ip_defaultttl
- ip_forwarding
- ip_intrqmax
- pmtu_defaulttime
- tcp_localsubnets
- tcp_receive
- tcp_send
- tcp_defaultttl
- tcp_keepstart 
- tcp_keepfreq
- tcp_keepstop
- tcp_maxretrans
- tcp_urgent_data_ptr
- udp_cksum
- udp_defaultttl 
- udp_newbcastenable 
- udp_pmtu
- tcp_pmtu
- tcp_random_seq

The solution could be to set the proper permission on
/sbin/mount_union:

#chmod u-s /sbin/mount_union

.D.14. SOLARIS 2.X AND NFS
--------------------------

If a process is writing over NFS and the user goes over the disk
quota will the process go into an infinite loop.

.D.15. SYSTEM STABILITY COMPROMISE VIA MOUNT_UNION
--------------------------------------------------

By executing a sequence of mount_union commands any user
can cause a system reload on all FreeBSD version 2.X before
1996-05-18.

$ mkdir a
$ mkdir b
$ mount_union ~/a ~/b
$ mount_union -b ~/a ~/b

The solution could be to set the proper permission on
/sbin/mount_union:

#chmod u-s /sbin/mount_union

.D.16. trap_mon CAUSES KERNEL PANIC UNDER SUNOS 4.1.X
----------------------------------------------------

Executing the trap_mon instruction from user mode can cause
a kernel panic or a window underflow watchdog reset under
SunOS 4.1.x, sun4c architecture.


.E. DUMPING CORE
~~~~~~~~~~~~~~~~

.E.1. SHORT COMMENT
-------------------

The core dumps things don't really belongs in this paper but I have
put them here anyway.

.E.2. MALICIOUS USE OF NETSCAPE
-------------------------------

Under Netscape 1.1N this link will result in a segmentation fault and a
core dump.

Ex:

xxx.xxx.xxx.xxx.xxx.xxx.xxx.xxx.xxx.xxx.xxx.xxx.xxx.xxxxxx.xxx.xxx. xxx.xxx.xxx.xxx.xxx.xxx.xxx.xxx.xxx.xxx.xxxxxx.xxx.xxx.xxx.xxx.xxx.
xxx.xxx.xxx.xxx.xxx.xxx.xxx.xxxxxx.xxx.xxx.xxx.xxx.xxx.xxx.xxx.xxx.
xxx.xxx.xxx.xxx.xxxxxx.xxx.xxx.xxx.xxx.xxx...>

.E.3. CORE DUMPED UNDER WUFTPD
------------------------------

A core dumped could be created under wuftp with two different
methods:

(1) Then pasv is given (user not logged in (ftp -n)). Almost all
versions of BSD:s ftpd.
(2) More than 100 arguments is given with any executable
command. Presents in all versions of BSD:sd ftpd.

.E.4. ld UNDER SOLARIS/X86
--------------------------

Under Solaris 2.4/X86 ld dumps core if given with the -s option.


.F. HOW DO I PROTECT A SYSTEM AGAINST DENIAL OF SERVICE ATTACKS?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.F.1. BASIC SECURITY PROTECTION
-------------------------------

.F.1.1. INTRODUCTION
--------------------

You can not make your system totally secured against denial of service
attacks but for attacks from the outside you can do a lot. I put this
work list together and hope that it can be of some use.

.F.1.2. SECURITY PATCHES
------------------------

Always install the proper security patches. As for patch numbers
I don't want to put them out, but that doesn't matter because you
anyway want to check that you have all security patches installed,
so get a list and check! Also note that patches change over time and
that a solution suggested in security bulletins (i.e. CERT) often
is somewhat temporary.

.F.1.3. PORT SCANNING
---------------------

Check which services you have. Don't check with the manual
or some configuration file, instead scan the ports with sprobe
or some other port scanner. Actual you should do this regualy to see
that anyone don't have installed a service that you don't want on
the system (could for example be service used for a pirate site).

Disable every service that you don't need, could for example be rexd,
fingerd, systat, netstat, rusersd, sprayd, pop3, uucpd, echo, chargen,
tftp, exec, ufs, daytime, time... Any combination of echo, time, daytime
and chargen is possible to get to loop. There is however no need
to turn discard off. The discard service will just read a packet
and discard it, so if you turn off it you will get more sensitive to
denial of service and not the opposite.

Actual can services be found on many systems that can be used for
denial of service and brute force hacking without any logging. For
example Stock rexec never logs anything. Most popd:s also don't log
anything

.F.1.4. CHECK THE OUTSIDE ATTACKS DESCRIBED IN THIS BLOG
---------------------------------------------------------

Check that attacks described in this blog and look at the
solution. Some attacks you should perform yourself to see if they
apply to your system, for example:

- Freezing up X-Windows.
- Malicious use of telnet.
- How to disable services.
- SunOS kernel panic.
- Attacking with lynx clients.
- Crashing systems with ping from Windows 95 machines.

That is stress test your system with several services and look at
the effect.

Note that Solaris 2.4 and later have a limit on the number of ICMP
error messages (1 per 500 ms I think) that can cause problems then
you test your system for some of the holes described in this paper.
But you can easy solve this problem by executing this line:

$ /usr/sbin/ndd -set /dev/ip ip_icmp_err_interval 0
                                                           
.F.1.5. CHECK THE INSIDE ATTACKS DESCRIBED IN THIS PAPER
--------------------------------------------------------

Check the inside attacks, although it is always possibly to crash
the system from the inside you don't want it to be to easy. Also
have several of the attacks applications besides denial of service,
for example:

- Crashing the X-Server: If stickybit is not set in /tmp
a number of attacks to gain
access can be performed.

- Using resolv_host_conf: Could be used to expose
confidential data like
/etc/shadow.

- Core dumped under wuftpd: Could be used to extract
password-strings.

If I don't have put out a solution I might have recommended son other paper.
If not I don't know of a paper with a solution I feel that I can recommend.
You should in these causes check with your company.

.F.1.6. EXTRA SECURITY SYSTEMS
------------------------------

Also think about if you should install some extra security systems.
The basic that you always should install is a logdaemon  and a wrapper.
A firewall could also be very good, but expensive. Free tools that can
be found on the Internet is for example:

TYPE:          NAME:                    URL:

LOGDAEMON NETLOG     ftp://net.tamu.edu/pub/security/TAMU
WRAPPER TCP WRAPPERS     ftp://cert.org/pub/tools/tcp_wrappers
FIREWALL         TIS                     ftp://ftp.tis.com/pub/firewalls/toolkit

Note that you should be very careful if building your own firewall with
TIS or you might open up new and very bad security holes, but it is a very
good security packer if you have some basic knowledge.

It is also very good to replace services that you need, for example telnet,
rlogin, rsh or whatever, with a tool like ssh. Ssh is free and can be
found at URL:

ftp://ftp.cs.hut.fi/pub/ssh


.F.1.7. MONITORING SECURITY
---------------------------

Also monitor security regular, for example through examining system log
files, history files... Even in a system without any extra security systems
could several tools be found for monitoring, for example:

- uptime
- showmount
- ps
- netstat
- finger


.F.1.8. KEEPING UP TO DATE
--------------------------

It is very important to keep up to date with security problems. Also
understand that then, for example CERT, warns for something it has often
been dark-side public for sometime, so don't wait. The following resources
that helps you keeping up to date can for example be found on the Internet:

- CERT mailing list. Send an e-mail to cert@cert.org to be placed
on the list.

- Bugtraq mailing list. Send an e-mail to bugtraq-request@fc.net.

- WWW-security mailing list. Send an e-mail to
www-security@ns2.rutgers.edu.


.F.2. MONITORING PERFORMANCE
----------------------------

.F.2.1. INTRODUCTION
--------------------

There is several commands and services that can be used for
monitoring performance. And at least two good free programs can
be found on Internet.

.F.2.2. COMMANDS AND SERVICES
-----------------------------

For more information read the man text.

netstat Show network status.
nfsstat Show NFS statistics.
sar         System activity reporter.
vmstat Report virtual memory statistics.
timex Time a command, report process data and system
         activity.
time Time a simple command.
truss Trace system calls and signals.
uptime Show how long the system has been up.

Note that if a public netstat server can be found you might be able
to use netstat from the outside. netstat can also give information
like tcp sequence numbers and much more.

.F.2.3. PROGRAMS
----------------

Proctool: Proctool is a freely available tool for Solaris that monitors
and controls processes.
ftp://opcom.sun.ca/pub/binaries/

Top: Top might be a more simple program than Proctool, but is
good enough.

.F.2.4. ACCOUNTING
------------------

To monitor performance you have to collect information over a long
period of time. All Unix systems have some sort of accounting logs
to identify how much CPU time, memory each program uses. You should
check your manual to see how to set this up.

You could also invent your own account system by using crontab and
a script with the commands you want to run. Let crontab run the script
every day and compare the information once a week. You could for
example let the script run the following commands:

- netstat
- iostat -D
- vmstat


How Search Engine works


1-Search Engines

     A--How do they work?

     B--Subject Trees

2-Information Retrieval Concepts

3-Fuzzy Queries

4-Using Logical Expressions, Boolean Operators

5-More Signal and Less Noise, Deeper into Boolean expressions.

6-Meta Search Engines

7-Advanced Search Features (not for the jumpy people)

8-Date meta tags

9-Building a search engine in PHP

10-Final Note

\***************/





Okay before we start I think you probably know how to use a browser and have some knowledge on how the Internet works.  This text file is not something that will make you very smart or "elite," but will show you how search engines work and how to get full advantage by using faster and clearer queries.



Lets start, there are two major tools for locating information on the web:  Subject trees, and Search engines.  Now I am positive you know and have seen some of these thingies, but you sit there and you try to search for something simple and all of a sudden you get 1739739 results if your searching for the word, "Hacking."  Now do you have time to sit and browse through all those sites or are you going to give up? I think your going to start picking random sites thus wasting your precious time, but if you like wasting time STOP READING THIS FILE.  Well since we don't like to waste time I'll show you easier ways to sharpen your browsing activities and to make your exploration more productive and exercise some self-discipline in order to keep your time online focused.

.:*~*:._.:*~*:._.:*~*:._.:*~*:._.:*~*:.



1-Search Engines:



For all of us online we are always looking for stuff to browse and waste time looking at, but most of the time we waste time just trying to find those good sites.  For many people, searching the web is synonymous with using a search engine. Search engines are similar to your online card catalog to locate a book in a library, even though search engines give you many search options and search features, the ones that have keyword search systems work best when you can tell them exactly what your looking for.  Many of you think, ohh man those just suck, but believe it or not, they are powerful tools that can locate resources you might never find in any other way.

.:*~*:._.:*~*:._.:*~*:._.:*~*:._.:*~*:.

A-So how do they work?



Search engines are based on software programs called "spiders," which scan the web periodically, collecting (Uniform  Resource locators) urls. Spiders usually start with a list of "seed" urls, these programs watch for hyperlinks to other urls and add any new urls not seen in their master list of urls.  After doing this, each url is then visited in order to scan it for more urls, and you know they just keep visiting and visiting.  Spiders collect web pages for search engines, and the search engines send spiders out onto the web to remain current, so after doing this each page found is indexed for keyword retrieval.  Then the results are added to massive databases of Web Page indices from which you can get a very fast response when you run a keyword search.

To start using a search engine, you must first construct a "search query."  Many or most engines work by matching keywords in the query against a database of web page indices.  This stuff just means you write a topic in the search box and you get results.  Search engines are generally incapable of finding the best possible document for any given query, so they are designed to retrieve a number of possible documents in order to increase your chance of finding at least one good site.  So each document or site returned after you construct a query search is called a "hit," and at many times usually every time engines will return thousands of hits for a single query.  When people receive stuff irrelevant to their search these results are called "false hits," which is really nothing more than a hit that doesn't address the keyword your trying to search for.  It is impossible to eliminate all false hits in a keyword search so the term, "noise" is used to describe the rate of these false hits.  So our goal today is to try to increase the signal and reduce the noise when conducting a keyword web search.



B-Subject Trees:



One other search strategy is to use a subject tree, or a directory. A subject tree is a hierarchically organized collection and subcategorizes that can be browsed to locate information.  Subject trees are really just browsing aids because they require some exploration, but they are designed to get you where you need to go, ie. www.yahoo.com .  Almost every engine I have seen actually the large ones have a subject tree for many topics.  When using a subject tree, always start from the root to the branch, which you will do after selecting options, which fit your keyword.  If you don't like to visit many pages and see crapy web art and pictures then subject trees are your friends. A good subject tree will make it easy for you to get where you want to go without having to go back and look at other topics.  These trees have a good advantage.  You see they only index documents that have been checked by reviewers and accepted as legitimate sources of information.  So this means the stuff they got is some good shit.  These trees won't cover every topic or every site but will aid you in finding something.  Well lets look at the bad side. A difficulty with subject trees is that storing everything that is relevant to a single topic under a single location is often impossible. Lets look at an example, lets say you have to find some stuff on "hacking texts," you will have to look under computers, Internet, programming, and the list goes on.  So this way might work but usually it won't cover very large or broad topics.



Which is better:



Okay I have constructed a table which will show you which will fit your needs.



--------------------------------------------------------

|             |  Search Engines  |   Subject Trees   |

--------------------------------------------------------

|Quality of urls| No real control  | Human reviewers   |

----------------|------------------|-------------------|

|Amount of Noise| A huge Problem   |  No problem       |

----------------|------------------|-------------------|                

|Dead Links     | Many Many        |  Very Few or none |

----------------|------------------|-------------------|        

|Coverage       | Spiders find     |    Few Gaps       |

|               | everything       |(for broad topics) |

----------------|------------------|-------------------|              

|Which is easy  | Need to study    | No need for study |

|               |advanced features |                   |

----------------|------------------|-------------------|              

|Stability of   | Very unstable    |    Very Stable    |

|results        |                  |                   |

--------------------------------------------------------



+++++++++++++++++

Newbie Cool Note|

+++++++++++++++++

Well if you still don't have an idea of what and how to

look for stuff, well I found a good little tool.

The WebCrawler search engine has set up a page

that displays the keywords of 28 random searches being

submitted to WebCrawler by real users in real time.

The display is automatically updated every 15 seconds, or

you can update it by pressing the refresh or reload button,

depending on your browser.

the url is http://webcrawler.com/cgi-bin/SearchTicker

++++++++++++++++++++++++++++

2-Information Retrieval Concepts



Lets start learning ways to find what we are looking for in a simple/easy manner.  So now we are trying to find sharper queries but first we need to know how these really work.

Information Retrieval (IR) is a branch of computer science that deals with finding information in large text databases. This branch of computer science has been around for many years, but became popular once the web became a popular attraction.  A web search engine is an IR  system dressed up in a user friendly interface. So under this nice/clean interface is a computer program that doesn't understand natural language and has no ability to comprehend your information needs.  IR systems work by checking out keywords in your input query and then they try to locate documents that contain those exact keywords.  So its your job to construct a good search query if your going to hope for the best.  Well you go doing so, but you still see crap sites showing up in your query, then you say, Mikkkee this bullshit doesn't work why these lame sites keep showing up? I am guessing you don't know anything about HTML so let me explain.

The method of ranking sites by the engines tries to put all relevant documents that have that key word query in the very top of the list.  When you happen to see the engine coming back with url's for "cooking," when you were searching for "hacking" this error was not the engines fault or your fault, but it was due to the fact that the engine was responding to hidden text on the web page.  HTML documents can contain text that is not displayed by web browsers but is nevertheless used by search engines.  In particular, a document may contain a list of keywords for retrieval purposes that are never displayed by the web browsers. This method can also be used by a search engine when it describes the document in its hit list. Document keywords and document descriptions are created with the tag in HTML, as shown in this example I have set up.





























(Note, if you want to learn more about HTML feel free to read the tutorials on HTML at http://blacksun.box.sk)



:*~*:._.:*~*:._.

3-Fuzzy Queries

:*~*:._.:*~*:._.

Pkay, the most popular engines, like altavista offer a "simple query option," whereby users can type full sentences describing what they are looking for. Many people think ohh the sentence should be in good English, well not really.  You can enter ungrammatical sentences, incomplete sentence fragments, disjoint phrases, or just plain nonsense, and the engine won't know the difference(remember what I said about RH programs!).  The engine will only see a bunch of words, now it will try to remove from the query any ignored words either done by the user or the engine. By doing so, the hit list will be dramatically decreased.

It is very good to start searching by constructing a fuzzy query, since it will show you how big your returned hit list will be. If your returns are in the thousands or hundreds then you  have an idea of what's going to be your task.



Okay let me construct a fuzzy query and give you an idea of what will be returned.



lets pretend this is the search box.



"I need a recipe for dark chocolate for my family"



I press search and I get 30,000 hits.(I rounded)



Now I don't have time to look at every one of those sites so lets try to make our return list smaller.



lets pretend this is the search box.



"I need a +recipe for +dark +chocolate for my family"



I press search and I get 3,000 hits.(I rounded)



+"recipe" +"Dark chocolate" -"White chocolate"



I press search and I get 600 hits.(I rounded)



+"recipe" AND +"Dark chocolate"



I press search and I get 30 hits.(I rounded)



Now your saying okay cool can I try this for other queries, the answer is yes. All I did was improve the hits at the top of the document rankings by  adding a (+)which means required terms and (-) meaning prohibited terms.  Most search engines that allow you to construct fuzzy queries let you mark term as required or prohibited.  When a search engine sees these tags, it reduces its hit list by deleting all documents that contain any prohibited terms as well as all documents that fail to contain all the required terms.



.:*~*:._.:*~*:._.:*~*:._.:*~*:._.:*~*:.

Note: Not all engines allow you to add (+) (-) they will do but it might conflict if they already do it by default.  For example to enter a fuzzy query for HotBot, change the default setting "all of the words" to "any of the words."

:*~*:._.:*~*:._.:*~*:._.:*~*:._.:*~*:.

4-Using Logical Expressions, Boolean Operators

:*~*:._.:*~*:._.:*~*:._.:*~*:._.:*~*:.

Almost every search engine that I have seen gives you an opportunity to enter a query in the form of a logical expressions.  A "Boolean query" is a query that combines keywords and key phrases in logical relations.  The logical connectives used to combine terms are called "Boolean Operators."  The most commonly used Boolean Operators are AND, NOT and OR, which can be used to narrow or broaden queries. Look at the last search I did, I used the AND operator to  make my hits smaller. So how does AND/OR work?



AND ---->  blah1 AND blah2 ==== both blah1/blah2 have to be included and the

AND operator narrows the search



OR  ---->  blah1 OR blah2 ==== both blah1/blah2 have to be included and the

OR operator thus broadening the search



NOT ---->  blah1 AND NOT blah2 ==== blah1 must be present and

   blah2 must not be present.



Boolean queries are not hard to understand but the difficulty comes when you have to select the right term.  Some people I have talked to say by using Boolean queries the best search results can be reached.  There are many variations in using Boolean queries, but the thing you should keep in mind is if a Boolean query contains more than three terms, it is probably tooo complicated so keep it simple!



.:*~*:._.:*~*:._.:*~*:._.:*~*:._.:*~*:.

Newbie Note:  Where can I Enter a Boolean Query?

.:*~*:._.:*~*:._.:*~*:._.:*~*:._.:*~*:.

Where should I enter a Boolean query? Now you need to know where to enter/do what.  If you enter a Boolean expression where the search engine isn't expecting it, the interface will probably not object. It will do the search, but it won't interpret the Boolean operators as you might have intended. So to conduct a valid Boolean query I am going to use the HotBot engine.  Change the default setting from "all of the words " to "the Boolean expression."  If you happen to be using the altavista engine, you must select the "Advanced Search" option in order to be able to enter a Boolean query. Well if you use the InfoSeek engine just forget about it, it doesn't support Boolean searches, it might but didn't when I last checked.

.:*~*:._.:*~*:._.:*~*:._.:*~*:._.:*~*:.

5-More Signal and Less Noise, Deeper into Boolean expressions.



This should be short, well lets say your searching for some educational site, or info and you keep on getting sites that deal with business or some advertisements that only piss you off and make your life miserable.  Well you can filter those pages out.  Okay lets say your searching for computer parts.(your looking for texts which explain how each parts of the computer work) So your searching and you keep getting sites which are selling computers and you get nothing that deals with knowledge on computer parts. So you start by constructing a fuzzy query, mainly +"computer parts" AND +"explanation" you get a good amount of sites but you still see many sites that are selling computers those are called commercial sites and only have ads that are of no use to us. So lets filter them out.

Three of the most popular engines support DOMAIN SEARCH  that makes it possible to remove commercial pages from the hit list. A domain search is much like the title search used in the example I explained before. When you preface a term with a domain tab, that term will be matched against the host name of a document's url. Most commercial sites end with a .com domain, so all we have to do is remove any hits that contain ".com" in their domain names.  Since I am going to explain only the 3 most popular engines, if your favorite engine is not covered in this file then just consult their help file.

Altavista, HotBot and InfoSeek each use their own tag type for domain names, but they all do the same shit.



Host: (for altavista)

domain: (for hotbot)

site: ( for Infoseek)



So now you use a domain tag as a preface to the name of a web server's host name or a portion of a web server's host name.  In the example, the hit list will exclude the web pages on the basis of web page host names your going to direct:



   -host:.com     +computer explanation (for altavista)



   -site:.com           " " " " " " "   (for InfoSeek)



   -domain:com         " " " " " " "    ( for HotBot )



ohh note hotbot doesn't need a .com just com.



So by combining a domain tag with a prohibited marker (-), you can take out all the web pages that come from .com sites



Even though I am showing you ways to decrease search engine "noise," if you start filtering out .com addresses you might be missing out on some good stuff, so use but know what your doing first.

.:*~*:._.:*~*:._.:*~*:._.:*~*:._.:*~*:.

A simple nice trick!



Okay lets say that your using all the stuff I am telling you about and your Boolean operators narrow your search to about 30 hits, and your just too lazy but still want to narrow it a little more, so you beg and I answer. heh

okay lets say your looking for info about Winzip because your having some problems, like your mouse won't work with it, okay the example is lame but I can't think of anything better right now, so you can call tech support or you can search, lets search!

so you type:



host:winzip.com AND +"Microshit Intellimouse" AND  zip*



The host: tag narrows the search to only the urls with the winzip.com domain name.  Please notice that this example is fake so there might not be any results if your searching for info on the Miroshit intellimouse and winzip, heh.  Now you might look carefully and you notice i put a (*) at the end what does that mean? On many search engines, the asterisk allows  you to pick up matches on any terms that begin with the chosen word "zip." So "zip" will match "ziped," "zips" and other words similar to zip. So I am looking for the words dealing with zip  so I can see what do I have to do in order to fix up my problem with my mouse.  So by doing this, your returned hits will be extremely narrow thus saving you a whole lot of time.



:*~*:._.:*~*:._.:*~*:

6-Meta Search Engines

:*~*:._.:*~*:._.:*~*:

Lets say your trying your best but the engine your searching from isn't giving you what you want, so what do you do? One life saver is meta search engines.  Since tapping multiple search engines is very tedious and repetitive, the task might take forever, but hail the meta search engine.  With a meta search  engine, you can type in your query once and press enter.  The query is then sent by the meta search engine to a number of other popular and well known search engines. So instead of searching one engine after another you can exploit this tool for your advantage. By using Boolean operators your with the meta engines you will have done one task in a short period of time, instead of wasting hours, you'll have results in seconds.

So how meta engines work?  A meta search engine collects the top hits from each of its search engines and decides how to present them to you.  It may try to interleave them in a single ranked list, or it my let you view them in blocks, one engine at a time.  Even though this sounds like heaven, you still need know hell still exists, so not all queries will be answered in a narrow fashion.



Lets look at an example of a meta search engine.



I have chosen Dogpile, found at www.dogpile.com



Dogpile accesses the following 14 engines for the web, including some engines that are only restricted to subject trees:



HotBot Magellan

Excite Guide   AltaVista

World WideWeb Worm   Excite

Lycos   Web Crawler

What u Seek   PlanetSearch

Yahoo       Lycos A2Z

Info Seek   WWW Yellow Pages



DogPile while searching tries to reduce bandwidth consumption and CPU cycles. It tries three engines at a time and moves on to the next three only if you request more hits.  First it starts with the narrower databases, which are subject trees) and then it moves to the larger engines.  Even though its queries are limited to a particular syntactic format, check out their online documentation for a complete list, i don't have time to do that for you, hehe.  So when you enter a query, dogpile will show the exact query submitted to each engine and it then tells you how many hits were found by each engine it tried.  Remember i said it tries the first six so it will first try out, Yahoo, Excite Guide, Lycos, Lycos A2Z, WWW yellow Pages and the World Wide Worm.  So because these engines have small data bases, don't be discouraged, try the others.  Dogpile first looks to see if  at least ten hits have been collected, if not, it automatically moves on to the next three engines. Otherwise dogpile stops and waits for instructions from the user.  The only disadvantage to using meta engines, is that it will have to massage your query to comply with every engine. So whenever possible, it generates a Boolean query by inserting AND between each pair of terms.  So lets say some engines don't support Boolean operators, or quoted phrases, like infoseek so that certain engine won't respond with any hits.  Other problems arise when we want to do domain constraints, the ones that i told you about -host:.com well a meta search engine is no place for those, because most engines don't support domain constraints, and the ones that do require different tags for the constraining terms.

Another Meta search engine you can play with is www.metacrawler.com



.:*~*:._.:*~*:._.:*~*:._.:*~*:._.:*~*:.

7-Advanced Search Features (not for the jumpy people)

.:*~*:._.:*~*:._.:*~*:._.:*~*:._.:*~*:.

The following descriptions of advanced search features are taken from the online documentation of each of the engines I will list.  I am going to describe features from AltaVista, InfoSeek, and HotBot.  Some of the engines can incorporate an interface in which you can enter optional search constraints, like the domain name feature.  Search engines are usually upgraded and updated daily so if some of these features won't work check their online documentation.



Constraining Features found in the Altavista Engine:



title:"Black Sun Research Facility"

This matches pages with the phrase Black Sun Research Facility in the title, which in HTML is blah



anchor:Trojans

This matches pages with the phrase trojans in the text of a hyperlink.



Text:Netbus

This matches pages that contain the word Netbus in any part of the visible text of a page.  Visible means that its not hidden in meta tags, a link or an image.(make the written tags in lowercase)



object:Marquee

This will match pages containing the name of the ActiveX object found in an object tag, here its marquee.



applet:Dancing

This matches pages containing the name of the Java applet class found in an applet tag.



link:blacksun.box.sk

This matches pages containing a link to a page with blacksun.box.sk in its urls.



image:Flower.jpg

This matches pages with flower.jpg in an image tag, best used when you claim a domain to search in.



url:index.html

will match words with index and html together in a page's url.



host:hotmail.com

will match pages with the phrase hotmail.com in the host name portion of the url.



domain:sk

Will match pages from the domain of sk(slovakia), you can also try .com or .edu with a .fr after the .com.



.:*~*:._.:*~*:._.:*~*:._.:*~*:._.:*~*:.

Constraining Features found in the Infoseek Engine:



Just follow what I stated above, the field name should be in lowercase, and immediately followed by a colon.  There should also be no spaces after the colon and before the search terms. (some tags may be similar)



title:"Blacksun Research Facility"

Will find pages with the phrase Blacksun Research Facility in the title portion of the document. So the webmaster would have written Blacksun Research Facility while building the webpage.



site:blah.com

This will find pages or the website blah.com, it will also find subdomains. So if i was looking at microsoft.com this tag will also find blah.microsoft.com but it will not find any domains what don't finish in .com like microshit.box.sk (no it will never exist,hehe)



url:movies

Will give you pages with the word movies anywhere in the url

like this url:  http:www.e-online.com/movies (the url is probably fake but this is how this tag works.



link:infoseek.com

This tag will match pages that contain a link to a page with infoseek.com in its url.  like if you want you can do this +link:blacksun.box.sk -url :blacksun.box.sk, which will show you how many external links point to bsrf.

.:*~*:._.:*~*:._.:*~*:._.:*~*:._.:*~*:.



HotBot

The hotbot engine allows a user who might be a little advanced to use its non-text search features from the main box.  A meta word is a keyword:value pair, which is separated by a colon(no spaces between them). So like the rest of the engines, hotbot lets you have greater control over your searches.  It is very important to know that Hotbot treats Meta words as words, not as commands that effect the whole search. So if i wrote title:blacksun hacking, it will return documents with the word blacksun in the title and hacking in the body of the document.

you can also try this.



   -feature:image +title:blacksun hacking



this will tell the engine not to return pages that contain images, but do have blacksun in the title and hacking in the body.



Here are some more:



depth:[number]

This tag will restrict the depth of pages retrieved



title:[word]

Already explained like a million times!



scriptlanguage:[language]

This will search for pages containing scripts written in javascript or other script languages.



linktext:[extension]

This will restrict to pages containing embedded files with a certain extension like .ra which will find pages containing real audio filz.



feature:[name]

This is very cool, can also be available under the media type panel

examples will be



feature:audio           <--- data-blogger-escaped-.au=".au" data-blogger-escaped-.wav=".wav" data-blogger-escaped-audio="audio" data-blogger-escaped-detects="detects" data-blogger-escaped-formats="formats" data-blogger-escaped-like="like" data-blogger-escaped-p="p">
feature:flash           <--- data-blogger-escaped-are="are" data-blogger-escaped-detects="detects" data-blogger-escaped-flash="flash" data-blogger-escaped-kewl="kewl" data-blogger-escaped-p="p" data-blogger-escaped-plugins="plugins" data-blogger-escaped-which="which">
feature:script          <--- data-blogger-escaped-a="a" data-blogger-escaped-detects="detects" data-blogger-escaped-in="in" data-blogger-escaped-javascripts="javascripts" data-blogger-escaped-like="like" data-blogger-escaped-p="p" data-blogger-escaped-page="page" data-blogger-escaped-scripts="scripts">
feature:table           <--- data-blogger-escaped-detects="detects" data-blogger-escaped-in="in" data-blogger-escaped-p="p" data-blogger-escaped-page="page" data-blogger-escaped-tables="tables" data-blogger-escaped-the="the">
feature:shockwave       <--- data-blogger-escaped-detects="detects" data-blogger-escaped-filz="filz" data-blogger-escaped-p="p" data-blogger-escaped-shockwave="shockwave">
feature:ActiveX         <--- data-blogger-escaped-activex="activex" data-blogger-escaped-controls="controls" data-blogger-escaped-detects="detects" data-blogger-escaped-p="p">
feature:applet          <--- data-blogger-escaped-a="a" data-blogger-escaped-applets="applets" data-blogger-escaped-detects="detects" data-blogger-escaped-in="in" data-blogger-escaped-java="java" data-blogger-escaped-p="p" data-blogger-escaped-page="page">


:*~*:._.:*~*:

8-Date meta tags

:*~*:._.:*~*:

This feature might not work on many engines but try it out, its very powerful and will narrow your hits dramatically!



Upon using date meta tags, your search will be restricted to specified pages that comply with your dates specified.  Currently, special cases in the search engine and only when used correctly within a Boolean operator, with out any pluses or minuses will the user get good results. Here is an example:

We are looking for (+hacking -cracking) AND  within:2/months (this is okay) This ways is wrong--->+hacking -cracking AND +within:2/months (you cant add a + or - after the Boolean operator.



Here are some more tags:



Within:number/unit

I explained this already, the unit part can be years, months, days.



before:day/month/year

restricts to the specified date. like it i wrote before:3/7/91 then nothing after that date SHOULD appear in my hit list.



after:day/month/year

something just after that specified date.

.:*~*:._.:*~*:._.:*~*

Advanced Nerdy Feature

For you guys who just love to learn more, chk out the view source option in your browser to examine the query comment near the top of the results page. This will show you the query specified with the engines form which has been mapped and achieved by meta words.

.:*~*:._.:*~*:._.:*~*

Another advanced fun feature

A fun thing you can play with most search engine is to search for keywords like "0:0" and "root:". If the engine allows such searches u can collect password files from misconfigured web servers. So that's a fun idea. Another fun feature  which usually brings up nice results is "url:.htpasswd" and "url:.htaccess" if u know what you are doing then the first one will bring up the location of password filz, where the restricted directories are. The second one "url:.htaccess" will bring up a username and an encrypted password. If you manage to get the password file then u will be able to crack the file from the many available cracking tools.  Other fun searches are url:etc and link:passwd . I tried those on altavista and didn't see great results so search on those if u really got time. Try your luck on old engines that are small or some of the big ones like dogpile and altavista. If worked out correctly your searches can bring up vital results.



.:*~*:._.:*~*:._.:*~*:._.:*~*:._.:

9-Building a Search Engine in PHP

.:*~*:._.:*~*:._.:*~*:._.:*~*:._.:

This is taken from spidermans tutorial found at http://spiderman.datablocks.net/



This short mini tut will show you how to create a search engine that you can use for your very own site or some other project that you may be working on.



Before we begin I'll show you the table I'll be using  in the examples:

_______________________________________

|First_Name  |Middle_Name |  last lame |

|--------------------------------------

|Dana     |Johnson  |Smith       |

|--------------------------------------

| Jill       |Angel       |Petersburg  |

|--------------------------------------

|Jack     |Coner       |Mitchel     |

|--------------------------------------





Before we actually make the search engine we need  to create a basic webpage that will have a input field where the user  can enter his or her search query. I'm going to keep mine simple, feel  free to make an elaborate one with lots of bells and whistles. The code for my page is below:





Simple Search Engine version 1.0






Enter the first, last, or middle name of the person you are looking    for:















   That's a pretty basic page so I'm not going to explain alot of it. Basically the user will enter the first, middle, or last name of the person they are looking for and hit enter. The contents of the input field will be passed to a php script named search.php and that will handle the rest.

Now that the page is out of the way let's create the actual script. First we want to actually connect to the database using mysql_connect( ) and select the table using mysql_select_db( ), then we want to parse the value passed to the script to see if it contains any invalid input, such as numbers and funky characters like #&*^. You  should always validate input, don't rely on things like JavaScript to do it for you because once the user disables JavaScript all that fancy validation goes down the toilet. Now to check the input we are going to use a regular expression, they are a bit confusing and will be explained in a later tutorial. For now all you need to know is that it will check to see if value passed is a string of characters. All right enough chatter, here is the first part of the script:




           mysql_connect("host",
          "username", "password")
          or die("Can't connect!");

           mysql_select_db("Names")
          or die("Can't select database!");
       

           if (!eregi("[[:alpha:]]",
          $search_query))

           {

                echo
          "Error: you have entered an invalid query,
          you can only use                 characters!
";

           }

               echo
          "Error: you have entered an invalid query,
          you can only use                 characters!
";

           }

   Now that we've done that we will form the search query.

$query= mysql_query("SELECT
          * FROM some_table WHERE First_Name= '$search_query' OR Middle_Name=
          '$search_query' OR Last_Name= '$search_query' ORDER BY Last_Name");

Look confusing? I'll explain, what is pretty much happening
        is we are asking MySQL to search all the rows in First_Name, Middle_Name, and Last_Name for a match to the query entered by the user, after it has found some results we then ask MySQL to alphabetize the results by Last_Name. The rest of the coding from now on is a breeze. We         will get the results from the query using mysql_fetch_array( ) and check
to see if there is a match using mysql_numrows( ). If there is a match,
or matches, we will output it along with the number of matches found;
if there isn't we will report to the user that we couldn't find anything.



$result= mysql_numrows($query);

if ($result == 0)

{

    echo "Sorry,
          I couldn't find any user that matches your query ($search_query)";

    exit;

}

else if ($result == 1)

{

    echo "I've
          found 1 match!
";

}

else {

    echo "I've
          found $result matches!
";

    while ($row= mysql_fetch_array($query))

    {

        $first_name= $row["First_Name"];

        $middle_name = $row["Middle_Name"];

        $last_name = $row["Last_Name"];

        echo "The
          first name of the user is: $first_name.
";

        echo "The
          middle name of the user is: $middle_name.
";

        echo "The
          last name of the user is: $last_name.
";

    }

}

?>



  I added that extra if statement so that when we report how many users we've found it's output will be in proper English. If I we don't the script will echo "I've found 1 matches" which obviously isn't good grammar : P The rest of the script basically loops through the results and prints them to a webpage. That's all, we've finished the script! The entire script is included below:







Simple Search Engine version 1.0 - Results






           mysql_connect("host",
          "username", "password")
          or die("Can't connect!");

           mysql_select_db("Names")
          or die("Can't select database!");
          if (!eregi("[[:alpha:]]",
          $search_query))

           {

                echo
          "Error: you have entered an invalid query,
          you can only use                 characters!
";

           }

        $query=
          mysql_query("SELECT * FROM some_table
          WHERE First_Name=           '$search_query'
          OR Middle_Name= '$search_query' OR Last_Name=           '$search_query'
          ORDER BY Last_Name");

           $result=
          mysql_numrows($query);

           if
          ($result == 0)

           {

               echo
          "Sorry, I couldn't find any user that matches
          your query                ($search_query)";

               exit;
             //No
          results found, why bother executing the rest of the script?

           }

           else
          if ($result == 1)

           {

                echo
          "I've found 1 match!
";

           }

           else {

               echo
          "I've found $result matches!
         
";

              while ($row= mysql_fetch_array($query))

              {

                  $first_name=
          $row["First_Name"];

                  $middle_name
          = $row["Middle_Name"];

                  $last_name
          = $row["Last_Name"];

                 echo
          "The first name of the user is: $first_name.
";

                 echo
          "The middle name of the user is: $middle_name.
";

                 echo
          "The last name of the user is: $last_name.
         
";

            }

   }

?>







If you try copy the same page and u run into errors, it might be because of formatting errors, which can give errors in the code. If this happens check spiderman's site for further aid.

.:*~*:._.:*~*:

9-Final Note  |

.:*~*:._.:*~*:

Many people think that one certain engine has all the pages indexed in it, well friends this belief is far from true.  Currently, there are thought to be about 150 million documents on the Web, although an exact count is very very difficult.  While writing this file, I talked with a friend and he said the largest search engine indexes at most 55 million documents, which is only a small portion of the web pages on the web.  So if you don't find what your looking for don't complain to me, I can help but if that certain engine doesn't have that page your looking for then try another.  If your still having problems concerning this topic, email me and i'll try my best to help you out.