Identification
From P2P-Fusion
| Identification
| |
|---|---|
| Metadata: | public key, private key, signatures |
| Importance: | very high |
Identification is a tool that links actions and objects to an identifier. This is usually done to create an online identity to which actions made by the same person can be atributed. (More generally, files or other resources can have identities too, but that doesn't belong to social processing.)
Contents |
Description
Identification is the most fundamental social processing tool. It creates a connection between different actions done by the same person. The collection of actions and objects (files, profile, etc) linked to the same identifier is the person's online identity. It is usually not linked to his real world identity, and does not necessarily contain all of the user's actions; the same person can have any number of online identities. Multiple identites are often used to separate different online activities, or to assume different roles.
Online identities are the basis for most social interactions between users: they allow users to think in terms of persons instead of independent actions. This makes it possible for social mechanisms like reputation and trust to function.
A central concept in identification is authentication: users need to be able to prove that the identity really belongs to them. There are two different approaches for this: central authorities and public-key cryptography. In the first approach, users identify themselves to an authority that is trusted by everyone (e.g. the site operators, or some government agency). This authority then registers their actions, and confirms the connection to other users. In the second approach, users don't need to trust anyone else; instead, they sign everything they do using some public-key cryptography algorithm, and anyone knowing their identifier (the public key) can verify the signature.
Data
Input
In the centralized approach, an identity is simply a username-password pair. In the decentralized approach, the indentifier consists of two parts: a public key, with which the user can announce his identity to other, non-trusted users; and a private key, which is used to prove his identity, and therefore must remain secret. Because key pairs suitable for cryptographic protocols aren't user-friendly, users can't be expected to remember them like passwords; instead, their client software must be able to manage the keys.
Storage
Besides the public key (which also serves as a pseudonym, representing the user) a signature, representing the specific action, message or object, must be stored. Storage can be fully decentralized: each client wishing to store information about another clients' actions stores the corresponding public keys and signatures. It can then locally check that the key and the signature match, and - if necessary - transmit the information it has to third parties.
Output
Users typically have nicknames associated with their identities; typically, only this name is shown to a human user. To other tools, the public key is returned.
Almost all tools use identification; the most prominent are access management, reputation and trust.
Dependencies
Almost every other social processing tool depends on identification. Some, like commenting, can work with full anonimity, but even those work much better when at least optional identification is possible.
Management
The most typical configuration settings are optional anonimity vs. mandatory identification, and countermeasures against automated identity creation (e.g. captchas).
Interface
The text of the nickname is a good place to include additional information: the name itself can link to the profile; next to it, reputation, trust, friend-of-friend and similar informations can be displayed in some compact form (icons or numbers).
Prevalence
Identification is used virtually everywhere, and is required by almost every other social processing tool. Some simpler commenting system use "good faith" identification without any authentication (users just tell who they are), but that opens them up to abuse. No real community can function without identification.
On the other hand, many systems and communities make identification optional: users can choose to participate anonymously, though that usually means loosing some privileges. (For example, users can comment at Slashdot without registration under the name "Anonymous Coward", but that means a lower starting score for the comment.) This lowers the entry barrier, and usually results in a smaller group of regular users who use identification, and lots of anonymous users who rarely contribute. Depending on the goals of the system, this might be good or bad. For example, most of Wikipedia's content is created by anonymous users and only formatted and regulated by registered users[1]; on the other hand, anonimity is undesirable in systems that are built upon resource sharing (like P2P systems).
Technical aspects
Implementation
Centralized identification (password checking) is, at the core, just string matching, but there are many additional steps to make it more secure: using password hashes, salting, secure communication channel, cookies, sessions etc.
Decentralized implementations use public-key cryptography; there many possible choices for the algorithm. RSA is one of the more popular choices. P2P clients often use elliptic curve cryptography because of its smaller key size, which decreases traffic overhead.
Problems
Indentification is only as safe as authorization is. An identification scheme which is not well protected against identity theft creates critical vulnerabilities in higher-level social processing tools built upon it. The most frequent method of theft in centralized systems is tricking users into giving out their passwords; when the communication channel or the central server is not properly secured, many other methods of attack are possible. An attacker can also take advantage of the fact thet most users have the same password in many systems, so its enough to break the least secure one.
In decentralized systems identity theft is a much less serious problem. The cryptographic protocols used for authentication are secure by design. Users probably don't even know their private keys, which are handled by the software, so they cannot give it out; and key pairs are usually generated specifically for that system, and not used elsewhere.
Another potential vulnerability is multiple identity creation. It is very hard to prevent a person having multiple online identities in the same system; that would require checking his real identity reliably, which is very slow and/or requires some sort of trusted middle-man (like a credit card company or a government organization). This is usually not possible, but not necessary either, because a few identities cannot do much harm. If the attacker can create a big number of identities, however, he can use them to manipulate higher-level social processing tools built on identification: influence rating and reputation, create fake trust and so on. It can also influence lower-level protocols, for example the attacker can create a number of fake identifiers large enough to skew random selection procedures (the so-called Sybil attack).
To prevent such attacks (or, at least, make them more difficult), one must limit the number of identities an attacker can create. (Furthermore, higher-level social processing tools should themselves be resistant against smaller multiple identity attacks.) One way to achieve this is to tie identities to some limited resource (in an online environment that would be typically the IP number); another is to associate some cost with identity creation, for example requiring new users to perform some sort of calculation (so-called proof-of-work), or requiring them to pay. Sometimes, tasks only performable by humans (like solving a captcha) are used.
These methods, except using IP numbers (which have limited usefulness because of proxies and firewalls, but are still applicable e.g. in distributed hash tables) are generally unsuitable for decentralized, P2P environments; instead, defending against multiple identities is done on some other level, not identification/identity creation. An example is the choke/unchoke system in Bittorrent, where the bandwith available to a user is limited by the bandwith he is donating at the same time.
Social aspects
One of the social problems is the aforementioned multiple identities. While not too dangerous in the technical sense, multiple identities can be used for misleading others and for trolling, which can be very damaging to a community. There is not much that can be done against that on the software level, communities have to solve such problems themselves; hovever, having an audit trail (preferably with timestamps and IP records) can help them a lot.
Another problem is changing identity to escape negative reputation. Unless there is a considerable cost for creating a new identity, users with negative reputation (be it software-registered reputation, or simply other users having bad opinion of him) can always escape that reputation by re-registering. This makes "punishing" accounts created solely to annoy or harm others impossible. One solution is to make the system so that users with zero reputation cannot use functions that are potentially harmful. Another is to impose some (social) cost on creating new identities. This works well with groups: communities can decide for themselves, whom to let in (maybe in exchange for some work done for them), and changing identity means the user must convince them again to let him join.
Real-world identification can be a problem in both ways: an investigator might be able, with sufficient time and effort, to use comments, time patterns, IP numbers and similar information to connect an online identity to the real-world identity. This can lead to privacy concerns. In a centralized system, the central authority has most of the relevant data, therefore proper privacy policies are important. In decentralized systems, not much can be done to prevent such investigations; IP numbers can be hidden, but only at the price of decreasing effective bandwidth to its half or less.
Another negative effect of identification is the raises the barrier to entry: for example, visitors of a web page will be less likely to contribute if they have to go through a registration process. Sites which rely heavily on user contributions usually counter this effect with making identification optional (see prevalence.)
Existing examples
- passwords on various websites
- PGP, an RSA/DSA implementation, mainly used to digitally sign mails
P2P file sharing examples
Application in Fusion
As most other social processing tools rely on identification, it is required for all Fusion scenarios.
External links
- ^ Aaron Swartz: Who Writes Wikipedia?
