topcoder题目及我的程序(1)——language frequency

晨曦之光 发布于 2012/03/09 14:13
阅读 356
收藏 0

今天参加某公司的面试,给了一道TopCoder题目,还算顺利。 

现贴出来供朋友们讨论学习。以后会陆续贴出以前写过的一些经典算法的程序。请指正。

=================================================================================

Problem Statement

In written languages, some symbols may appear more often than others. Expected frequency tables have been defined for many languages. For each symbol in a language, a frequency table will contain its expected percentage in a typical passage written in that language. For example, if the symbol "a" has an expected percentage of 5, then 5% of the letters in a typical passage will be "a". If a passage contains 350 letters, then 'a' has an expected count of 17.5 for that passage (17.5 = 350 * 5%). Please note that the expected count can be a non-integer value.
The deviation of a text with respect to a language frequency table can be computed in the following manner. For each letter ('a'-'z') determine the difference between the expected count and the actual count in the text. The deviation is the sum of the squares of these differences. Blank spaces (' ') and line breaks (each element of text is a line) are ignored when calculating percentages.
Each frequency table will be described as a concatenation of up to 16 strings of the form "ANN", where A is a lowercase letter ('a'-'z') and NN its expected frequency as a two-digit percentage between "00" (meaning 0%) and "99" (meaning 99%), inclusive. Any letter not appearing in a table is not expected to appear in a typical passage (0%). You are given a String[] frequencies of frequency tables of different languages. Return the lowest deviation the given text has with respect to the frequency tables.
Definition

Class:
SymbolFrequency
Method:
language
Parameters:
String[], String[]
Returns:
double
Method signature:
double language(String[] frequencies, String[] text)
(be sure your method is public)


Notes
-
The returned value must be accurate to within a relative or absolute value of 1E-9.
Constraints
-
frequencies will contain between 1 and 10 elements, inclusive.
-
Each element of frequencies will be formatted as described in the statement.
-
Each element of frequencies will contain between 6 and 48 characters, inclusive.
-
No letter will appear twice in the same element of frequencies.
-
The sum of the percentages in each element of frequencies will be equal to 100.
-
text will contain between 1 and 10 elements, inclusive.
-
Each element of text will contain between 1 and 50 characters, inclusive.
-
Each element of text will contain only lowercase letters ('a'-'z') and spaces (' ').
-
text will have at least one non-space character.
Examples
0)


{"a30b30c40","a20b40c40"}
{"aa bbbb cccc"}
Returns: 0.0
The first table indicates that 30% of the letters are expected to be 'a', 30% to be 'b', and 40% to be 'c'. The second table indicates that 20% are expected to be 'a', 40% to be 'b', and 40% to be 'c'. We consider the text to have length 10, as blank spaces are ignored. With respect to the first table, there are 2 'a' where 3 were expected (a difference of 1), one more 'b' than expected (again a difference of 1) and as many 'c' as expected. The sum of the squares of those numbers gives a deviation of 2.0. As for the second table, the text matches expected counts exactly, so its deviation with respect to that language is 0.0.
1)


{"a30b30c40","a20b40c40"}
{"aaa bbbb ccc"}
Returns: 2.0
Here we use the same tables as in the previous example, but with a different text. The counts for 'b' and 'c' each differ by 1 from the expected counts in the first table, and the counts for 'a' and 'c' each differ by 1 from the expected counts in the second table. The text has a deviation of 2.0 with respect to both tables.
2)


{"a10b10c10d10e10f50"}
{"abcde g"}
Returns: 10.8
Here, each of the letters 'a' through 'e' is expected to make up 10% of the letters (0.6 letters). Each of those letters actually appears once, so the difference is 0.4, which becomes 0.16 when squared. 50% of the letters (3 letters) are expected to be 'f', but 'f' does not appear at all. The square of this difference is 9.0. No 'g's are expected to appear, but there is one in the text. This adds 1.0 to the deviation. The final deviation for this table is: 0.16+0.16+0.16+0.16+0.16+9.0+1.0=10.8.
3)


{"a09b01c03d05e20g01h01i08l06n08o06r07s09t08u07x01"
,"a14b02c05d06e15g01h01i07l05n07o10r08s09t05u04x01"}
{"this text is in english"
,"the letter counts should be close to"
,"that in the table"}
Returns: 130.6578
These two frequency tables correspond (roughly) to the frequencies found in the English and Spanish languages, respectively. The English passage, as expected, has a lower deviation in the first table than in the second one.
4)


{"a09b01c03d05e20g01h01i08l06n08o06r07s09t08u07x01"
,"a14b02c05d06e15g01h01i07l05n07o10r08s09t05u04x01"}
{"en esta es una oracion en castellano"
,"las ocurrencias de cada letra"
,"deberian ser cercanas a las dadas en la tabla"}
Returns: 114.9472
The same tables again, but with Spanish passage. This time the second table, which correspond to frequencies in Spanish, gives the lowest deviation.
5)

{"z99y01", "z99y01", "z99y01", "z99y01", "z99y01",
 "z99y01", "z99y01", "z99y01", "z99y01", "z99y01"}
{"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
 "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
 "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
 "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
 "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
 "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
 "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
 "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
 "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
 "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"}
Returns: 495050.0

This problem statement is the exclusive and proprietary property of TopCoder, Inc. Any unauthorized use or reproduction of this information without the prior written consent of TopCoder, Inc. is strictly prohibited. (c)2003, TopCoder, Inc. All rights reserved.

=================================================================================

源程序如下:

#include  < STDLIB.H >
#include 
< stdio.h >
#include 
< vector >
#include 
< STRING >

using   namespace  std;

class  deviation
{
public:
    
int m_nTotalNumber;    //the total number of all symbols in the actual text
    int m_arrayExpected[26];    //the expected number of each symbol
    int m_arrayActual[26];        //the actual number of each symbol

    vector 
<string> m_vecFreq;    //the frequency array of alapha
    vector <string> m_vecText;    //the text array

public:
    
    deviation();
    
~deviation();

    
void InitializeArrayExpected();
    
void InitializeArrayActual();

    
void GetEachSymbolExpected(string str);
    
void GetEachSymbolActual(vector <string> text);
    
double GetDeviation();

    
double language(vector <string> frequencies, vector <string> text);
}
;

deviation::deviation()
{
    m_nTotalNumber
=;
    m_vecFreq.clear();
    m_vecText.clear();
    InitializeArrayExpected();
    InitializeArrayActual();
}


deviation::
~ deviation()
{

}


// initialize the expected array
void  deviation::InitializeArrayExpected()
{
    
for(int i=;i<26;i++)
        m_arrayExpected[i]
=;
}


// initialize the actual array
void  deviation::InitializeArrayActual()
{
    
for(int i=;i<26;i++)
        m_arrayActual[i]
=;
}


// get each symbol and its frequency of a given string
void  deviation::GetEachSymbolExpected( string  str)
{
    unsigned 
char ch1,ch2,ch3;

    
int len=str.size();
    
for(int i=;i<len;i+=3)
    
{
        ch1
=str.at(i);
        ch2
=str.at(i+1);
        ch3
=str.at(i+2);
        m_arrayExpected[ch1
-'a']=(ch2-'')*10+(ch3-'');
    }

}


// get each actual symbol
void  deviation::GetEachSymbolActual(vector  < string >  text)
{
    
int len;
    unsigned 
char ch;
    
string str;

    
for(int i=;i<text.size();i++)
    
{
        str
=text.at(i);
        len
=str.size();

        
for(int j=;j<len;j++)
        
{
            ch
=str.at(j);
            
if(ch!=' ')        //ch is not space
            {
                m_arrayActual[ch
-'a']++;
                m_nTotalNumber
++;
            }

        }

    }

}


// get deviation, the deviation is the sum of the square of each symbol
double  deviation::GetDeviation()
{
    
double difference=,sum=;

    
for(int i=;i<26;i++)
    
{
        difference
=m_arrayExpected[i]/100.0*m_nTotalNumber-m_arrayActual[i];
        sum
+=difference*difference;
    }


    
return sum;
}


// calculation
double  deviation::language(vector  < string >  frequencies, vector  < string >  text)
{
    
double dblMin,dblDeviation;

    
//initialize the total number of all symbols in the text
    m_nTotalNumber=;
    
    
//get the actual number of all symbols in the text
    InitializeArrayActual();
    GetEachSymbolActual(text);

    
//initialize
    InitializeArrayExpected();
    GetEachSymbolExpected(frequencies.at(
));
    dblMin
=GetDeviation();

    
//get the minimum deviation
    for(int i=1;i<frequencies.size();i++)
    
{        
        InitializeArrayExpected();
        GetEachSymbolExpected(frequencies.at(i));
        dblDeviation
=GetDeviation();

        
if(dblMin>dblDeviation)
            dblMin
=dblDeviation;
    }


    
return dblMin;
}


void  main()
{
    deviation de0,de1,de2,de3,de4,de5;
    
double dblMin0=,dblMin1=,dblMin2=,dblMin3=,dblMin4=,dblMin5=;

    
//case 0
    de0.m_vecFreq.push_back("a30b30c40");
    de0.m_vecFreq.push_back(
"a20b40c40");
    de0.m_vecText.push_back(
"aa bbbb cccc");

    
//case 1
    de1.m_vecFreq.push_back("a30b30c40");
    de1.m_vecFreq.push_back(
"a20b40c40");
    de1.m_vecText.push_back(
"aaa bbbb ccc");

    
//case 2
    de2.m_vecFreq.push_back("a10b10c10d10e10f50");
    de2.m_vecText.push_back(
"abcde g");

    
//case 3
    de3.m_vecFreq.push_back("a09b01c03d05e20g01h01i08l06n08o06r07s09t08u07x01");
    de3.m_vecFreq.push_back(
"a14b02c05d06e15g01h01i07l05n07o10r08s09t05u04x01");
    de3.m_vecText.push_back(
"this text is in english");
    de3.m_vecText.push_back(
"the letter counts should be close to");
    de3.m_vecText.push_back(
"that in the table");

    
//case 4
    de4.m_vecFreq.push_back("a09b01c03d05e20g01h01i08l06n08o06r07s09t08u07x01");
    de4.m_vecFreq.push_back(
"a14b02c05d06e15g01h01i07l05n07o10r08s09t05u04x01");
    de4.m_vecText.push_back(
"en esta es una oracion en castellano");
    de4.m_vecText.push_back(
"las ocurrencias de cada letra");
    de4.m_vecText.push_back(
"deberian ser cercanas a las dadas en la tabla");

    
//case 5
    for(int i=;i<10;i++)
    
{
        de5.m_vecFreq.push_back(
"z99y01");
        de5.m_vecText.push_back(
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa");
    }


    
//calculation all the minimum deviation
    dblMin0=de0.language(de0.m_vecFreq,de0.m_vecText);
    dblMin1
=de1.language(de1.m_vecFreq,de1.m_vecText);
    dblMin2
=de2.language(de2.m_vecFreq,de2.m_vecText);
    dblMin3
=de3.language(de3.m_vecFreq,de3.m_vecText);
    dblMin4
=de4.language(de4.m_vecFreq,de4.m_vecText);
    dblMin5
=de5.language(de5.m_vecFreq,de5.m_vecText);

    
//display
    printf("the minimum deviation of case 0 is: %f ",dblMin0);
    printf(
"the minimum deviation of case 1 is: %f ",dblMin1);
    printf(
"the minimum deviation of case 2 is: %f ",dblMin2);
    printf(
"the minimum deviation of case 3 is: %f ",dblMin3);
    printf(
"the minimum deviation of case 4 is: %f ",dblMin4);
    printf(
"the minimum deviation of case 5 is: %f ",dblMin5);
}

 


原文链接:http://blog.csdn.net/livelylittlefish/article/details/2097805
加载中
返回顶部
顶部