OPT-OUT and Anonymization

OPT-OUT

The OPT-OUT mechanism uses a specific cookie content to identify the client, who does not want to be measured. Due to the fact that a direct recognition of the client is not provided, an other identification of such clients is currently not possible. A client is able to change his identity anytime (cookie deletion). Such an identity change always leads to a loss of specific settings of the initially clients (who does not accur in the system anymore from the moment of change).

Therefore it is necessary that the client, who refuses measurement, communicates this setting to the system constantly. This means that the client may not delete his specific cookie.

The central place for the OPT-OUT settings is the QDS (questionnaire dispatching system), which function is also the modulation of questionnaires. The QDS provides URLs with which:

  • the Opt-Out cookie can be set
  • the status of measurement can be sorted out
  • the Opt-Out cookie can be deleted

A protocol of hits including the accessing client is NOT provided currently, but could be generated for the determination of an Opt-Out quota.

The module of the QDS enables to request the client's status ("no cookie", "active in measurement", "Opt-out") and to delete the Opt-Out status again, where the measurement cookie is deleted. The integration/recall of the Opt-Out module could be done personalized from any website. The Opt-Out module currently does not protocol any data. However this could be changed after a common decision and after the determination of requirements.

Opt-Out-Variants

 Variants to determine the Opt-Out settings of a client:


OO 1. All users with Opt-Out property are processed under one client identifier

Hits from clients with active Opt-Out are processed under one ID. These hits are collected in the transaction data of a client. Just as more than one real client generates usage under this ID, a unique mapping is no more possible.


OO 2. Requests of clients with Opt-Out property are rejected (resp. are not passed farther than protocol level)

intensification of 00 1.: Hits from clients with this property are responded correctly by the systems, but are not protocolled.

In this case all hits of those clients are ignored. No usage values are generated.


Anonymization through shortening of IP4 adress

For the anonymization of the requests the IP4 adresses should be shortened by the less significant bits. This leads to a reduction of resolution accuracy of this address by the factor 2^number of bits (this number of clients/computers could be hidden behind the shortened address). There are two places in the system where this shortage could be conducted. -----

IP 1. Shortage on protocol level

In this case the addresses are stored only for the duration of communication with the requesting computer. All following components do only receive the shortened address. Affected are all log files, all mappings and the signature generation.

With the data entry in the boxes the last bits of the IP4 address is set on value 0.

A storage or processing of the entire address farther that the duration of the technical request does not happen.

Effects

  1. The protocol of entry data (log files) does not show the entire address anymore
  2. mappings over address maps generate more vague results
  3. The signature generation produces different signatures

Because the signature generation delivers different results due to the changed IP address, also the client resolution of the following components is changed. The clients which are identified via signature or signature-cookie-pair will be regenerated in the moment of change. To ease this effect the algorithm can be set to "trivial" in the resolution mechanism. This algorithm does not consider any correlations between cookie and signature. It decides to use either cookie or signature for identification and uses the cookie if existing.


IP 2. Shortage by further processing

In this case the addresses are shortened via the respective component before processing. This happens within the respective components, wherby a selection is possible. This means the shortage may happen in the log stream and during the mappings. The signature generation in this case may happen as usual.

A shortage of the address before the signature generation will have influence on the visits  (probably slight) and the clients (relevant until severely). In the last case a change of the identification algorithm should be conducted: the identification is done via cookie and via signature (if the cookie does not exist). Combinations are not considered anymore. So the effects hopefully get reduced (proxy recognition and cookie change are not considered here anymore).


Effects on IP address shortage

Besides the signature generation the IP address is used for the resolution of the client's geographic origin. Here IP addresses are mapped on domestic and abroad. Due to the fact, that in case of a shortage of IP addresses, biases are expected, an analysis in this mapping files was made. Domestic files (the IP range in the mapping data, with mapping on Germany) was investigated concerning the overlapping data after the shortage (which now point abroad). In this investigation the whole affected ranges were marked as "overlapping", not only the effected part ranges. Furthermore the distinction was made concerning the interval size but not concerning the usage intensity of the single ranges.

Therefore in case of a shortage by one octed the result shows the 8 less significant bits, a part of ambiguous parts of approximately 8% in case of a shortage by 8 bit. In case of a state resolution an additional part of approximately 6% with ambiguous parts (in domestic) needs to be added.

An investigation from which number of shortened parts, which number of ambiguity occurs, shows a leap of less than 1% with a shortage from 3 bit to 8% with a shortage from 4 bit or more.

To reach a best possible ratio between data protection and resolution accuracy, the IP address could (concerning the results described above) be shortened by 3 bit generally. Because this could not be enough to reach a total anonymization of the signature, the IP address during (pre-) generation of signature could be shortened by farther 5 bits by 8 bit in total.