« My Thoughts from Today's Guest | Main | My Thoughts from Today ' s Guest: Dean Thomas »

10/19/2004

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Dan Shapich

Not having all the data, data of poor quality, incomplete data, and non-standardized data would cause problems when trying to extract meaningful information. In order to be able to sort through data and make something meaningful out of it, I would have to assume that all the information was accurate and valid unless I had some information to tell me otherwise. By this I mean if I were given data to sort through, I would assume the information is unadultered. If the data was somehow doctored, than a complete and accurate composition would not be possible without some degree of uncertainty.

Tina M. McHenry

The DBMS, database management system, allows users to create, edit, store, update, delete, and most importantly organize and make use of data that would otherwise be unused or inaccessible to company employees or anyone for that matter. I feel this type of organization system is exrememly important and useful and if used efficiently can become a necessary asset to a corporation. As shown in the reading, it can be a very useful tool for inventory, mailing lists, etc., but when looking at the data that is being gathered and submitted in these systems, there can be problems.When looking at the questions of: What are some problems that might occur when extracting meaningful data? I think trying to keep data valid can be a difficult task if one allows for their own toughts/viewpoints conflict with data collected. I would try to minimize the risk of my own bias being involved with whatever I was trying to obtain information for would be to ask others if they felt the questions were fair and non-judgmental. The subjects answer in response to a question will probably have some bias to them. Everyone views things differently, so trying to minimize participant bias, I think would be extremely difficult. As for incomplete data, I would eliminate that all together from my data pool.

Isiah Jones

When extracting and analyzing data you must know what you need the data for, which will help you pick out what you need from a particular set of data. The data that you determine is useful for what ever reason must also be complete. If your data is missing important features to it then you won't be able to interpret it as well as you probably should. The size and source of your data could also cause a problem with increasing your chances of using the wrong material causing a waist of time and resources that could be limited.

Michelle Cheung

When we are trying to extract meaningful data, there might be the subjective basis from the interpreters that influence the finding. For example, when the experts are hired by the company to extract information for them, there might be a basis since they are paid to do so. Another problem that might occur is the Auspices bias, which is the response of the subjects caused by the respondents being influenced by the organization conducting the study. For instance, when the Gun Association is doing the research on the ‘gun usage’, their result would be very different when the research is done by Welfare Department. Hence, people tend to be influenced by the organization, which conducts the survey. In order to extract valid and reliable data, I would plan to conduct the survey avoiding the biases in the interpreter and the Auspices bias. I would well train the interpreters and provide guidelines and rules for interpreting the results. Another possibility to avoid interpreters biased is letting outside marketing research company do the job. Also, I would send out anonymous group for conducting the survey. So the respondents will not be influenced by the organization during the interview.

Tom Wirth

I know from working with and seeing databases that retrieving online data can be a tricky task. If the user at any point fills in the wrong field with the wrong information or raw data, that could screw up the entire survey, order, or whatever they might be filling out. This incorrect information, once stored, could also prove other data entries useless. My point is, when retrieving data from users, the programmer or whoever must make it as easy and as clear as possible what they are asking of the user. Even after the explanation and instructions are given to the user, the programmer should apply safegaurds just in case the user does mess up. For instance, if the user is ordering something and inputs the incorrect credit card number, the programmer should create a safegaurd that informs the user of their mistakes. This might be something as simple as making sure the credit card number and the name given match up.The scary thing is that with all the data traveling around on a daily basis, a simple mistyped character or information in the wrong field can really give false information.First off, in order to extract meaningful or valid data the retriever or person who desires data needs to make sure the source or person they are getting the data from is indeed reliable and valid. If the user is a chronic liar or person of strong bias, they may produce useless information. Also, I think in order to retrieve valid and useful data, the programmer or retriever should make the questions and what is expected of the user extremely easy to understand. They should use an easy to follow interface as to not confuse the user. They need to emphasize the importance of honesty and being either impartial or partial, depending on the information desired. Perhaps, most importantly they need to make sure the data can travel efficiently and securely to the database where it is stored. If data is ruined, stolen, or changed during the process of storage, then it is useless information. Overall, while many factors such as user integrity and ease of data extraction play important roles in retrieving valid data, I tend to feel the most important role in the entire process would have to be data security and information assurance.

Scott Kellerman

When extracting meaningful data, there are many problems that could possibly come about. There are issues of lost data. If a database goes down, sure there is a backup, but its not as accurate as the one that went down. There are several methods used to fill in the gaps of data but it will not be as accurate as the original database was. Another possible problem is one of security. The question of who has access to the information and to what information do these persons have access to comes about. Once a person extracts data, they need to organize this data in a good way. Otherwise the extracted data might be just as useless as raw data would be. Keys help you extract the data that you want too because you can search by a certain key element. Another thing to keep in mind is who has done the data extraction. Don’t forget statistics lie. If the person extracting the data has a specific point to prove, then they can pick and choose what data to use in their final report misleading others by manipulating the data to their own advantage.

Mike O'Donnell

When extracting date many problems can occur. The biggest problem is false information. People who lie in the research can throw off the data. You must decifer between what you think is true and what is false. Lost data can cause a problem as well. Even though the data can be backed up sometimes some of the information is lost and therefore the data can be misconstrude. Some people might inturprit the material incorrectly making the results go a different way. When researching people to take the questionair or whatever is used, there could be a problem with different peoples thoughts and they could think that you are asking them something different. People who are doing the research for the data must know how to pick out the good and the bad and make sure that the data will come out as is it suppose to be and not have, I don't want to say any problems but, little errors in their data.

Lance Gawel

As we continue to push our limits with technologies we must be careful with what we do with the data and make it as productive as possible. There will always be mistakes or problems that arise but we must know how to correct it. Fraud is a huge concern that can become a major mistake from data input from companies and people. Computer programmers must know how to prevent false credit card claims and make sure that the right person is entering the data. Privacy is key in preventing fraud and to do so the companies must secure their network databases in order to keep public confidentiality. Every person has the right to keep their private information confidential but sometimes that is not always the case. Companies may sell their databases of information to other companies to use it in a study or contact people without the user’s consent. Updating this information can become another problem that may make databases inaccurate or useless. Information is always changing constantly and it is very hard to keep instant updated databases because people are always moving addresses, there may be a death, and babies are constantly being born. There are several problems that can be accounted for but the researcher must know a way to prevent these problems or else he will be stuck with unreliable and useless data.Extracting valid and reliable data is harder than it may sound but planning out the data you want is a great first step. You must make sure that people are able to understand what you are asking for and that they can give you the truth. If people are confused they will tend to get uninterested with the data and just start filling in anything. You need to find data that will help you keep your results unbiased and by doing so you have to ask several kinds of people and not just stick to one group of people. Hiring experts to look at your data can be a good solution to find skewed results or to find out that something is just not right. Finding the right database management system (DBMS) is most important in creating and updating information data. Allowing the users to create, edit and update data in database files is useful in making sure you have correct and updated information. The ability to store and retrieve data from those database files allows the researcher to access the information and to use the meaningful data. It can bring a success or failure effect on an organization. Databases will never become extinct because information would be very difficult to analyze for organizations.

Rich Dominico

Producing meaningful information takes much effort and organization. When a worker is attempting to collect data, many factors must be taken into consideration. The worker must figure out who will be the targeted audience, what will be the most effective way of gaining information from them, and how will the worker make sure they are truthful. When attaining human generated information, it makes sense to have people who are familiar with the topic being discussed. If the respondents have an interest in the topic then they will be more willing to take their time and answer appropriately. This way, the audiences opinion will carry more weight and will most likely be more truthful. With a little bit of effort from the worker, he/she will find the right audience who will give meaningful data. Obtaining valid information from respondents can be difficult. The worker can obtain data through surveys but that is very impersonal and possibly invalid. Interviews are personal but they are also very time-consuming. There really isn’t a perfect way to gain human generated information. It depends on the situation, the number of respondents, etc. The most effective way to go about collecting data is probably a mixture of surveys and interviews. Lastly, the worker needs the respondents to be truthful. This is where most of the worker’s effort and organization comes into play. If the worker has researched his target audience and has organized an effective way to keep the respondents interested, then he/she has done their best to obtain truthful and significant information.

Jason Streeter

Meaningful information is the only type of data that can be useful in its interpretation. If the data isn’t concise, it is basically useless.To help solve the problem of collecting meaningless data, I think begins with those conducting the study. If they are passionate about the study then the results and how they will obtain the data will be taken seriously. Also, the user must be familiar with the technology they are using because without the data, computers lose a lot of their potential. Other problems can occur with collecting data. First, the data can only be interpreted if it is entered into the database correctly. I learned in my statistics class how just 1 simple entry can throw off a whole correlation or ruin the mean. Also, you have to trust that the sampling of people from which the data is being collected is truthful. If they didn’t take it seriously, then that data seems kind of worthless.Data is how we survive in the business world. We collect, analyze, and make decisions based on company status, progression, opinions, and so much more. Being able to comprehend the information and use it to your advantage can keep you a step ahead of those who don’t pay attention to it.

Ryan

Extracting meaningful information out of a database has two real main problems so far as I can see.For one, there is a lot of room for user error when entering so much information into a database. Many times a persons only job is to sit at a desk all day and punch numbers and names into a database -- I know personally, I am a pretty good typist and I still miss mistakes all the time. A simple mistake in a database can completely throw off certain queries somebody might try to pull out of a table.The other glaring problem I see that can come up is that many times the wrong information can be given or pulled to try and make meaningful facts out of. It's important that the information entered into a database contains all things that would ever need to be searched for at a later time. If all you enter about a person is there name and phone number, and you're trying to find people of a certain interest or area to call, you can't possibly know which ones to use. The same goes for receiving bad information to put into the database. This can happy for a bevy of reasons, be it, untruthful data, or just bad polling, possibly even mixups while gathering data. It seems to me like the software to use these databases works well, but it's more of an implementation problem than it is anything else. If there were more safechecks in these systems and more failsafes as far as the data gathering went, these databases could prove even more beneficial than they already have.

Angelica

One of the biggest concerns for acquiring meaningful data is the idea of the persons not capable of providing that data because of their lack of knowledge on the subject. Another huge concern is for those who just don't care about the topic. Collecting meaningful data from these types would be almost impossible not to mention pointless. If you don’t have a plan of action, that will also provide difficulties for retrieving the specified data. Certain technologies are also necessary for collecting at least reasonable data such as computers, communication hardware/software, etc.; also consider group interaction and cooperation. In order to retrieve retrieving “valid” and “reliable” data, you have to incorporate some (if not all) technologies listed above. Conducting surveys, researching online and in libraries, and interviewing intelligent individuals are all excellent starters for action towards the collection of information.

Eric Anderson

There is tons of supposed meaningful information on the internet, company’s databases and your own. When you, a company or whomever take this information, the biggest problem is how do you know if it is meaningful data and meaningless? And because reports, websites, or books are written by human beings they are often skewed and the information they gave out is not valid. Other problems come from receiving information is; that you, as customer can be subjectively put into a category you don’t want to be in or for some reason shouldn’t be in.To help ensure "appropriate" customer service, some companies have initiated a grading system for their customers. So the customers with the “top grade” get the most attention from marketers and best service from companies. So whoever has the most money gets treated the best and people that are struggling get to stay that way. That also means the people with money get annoyed with calls from companies and people without the mullah get nothing. This “meaningful” information is produced from the money or the spending habits of a person. Other supposed meaningful information can come from people’s ideas or opinions that are taken as facts. The best example of this is children or students using the internet.The only way to make sure that what you get from the internet is “meaningful” data is basically to make sure the information is from a reliable source, reliable in that, from a source that is known for accuracy in that subject. Such as, getting information for French and Indian War the source should be from a history teacher or a well known author in that subject. As for companies harassing their best customers is accurate and a good use of meaningful data because those people with the “top grades” do have the most money which means they do not have to be frugal with their money.

Jim Barron

Well, obviously as a knowledge worker, I would make sure my employer was set up with a DBMS (Database Management System. This would allow for an improved availibilty, minimized redundancy, accuracy, program and file consistency, user friendly data, and improved security. This would also assist me in the task of extracting meaningful data. Truthfullness, quantity of responses, possible bias, and time constraints can lead to "stories" and these "stories," may not be valid information. Other things that could elevate the chance of extracting bad information would be not caring about what you are trying to find out, what the meaning is, the researcher must, also be educated in the data he is trying to find out, otherwise, people will not take it seriously. When looking for data, we should be doing research that is randomized to try to show off a particular population. Things that can help us do this are a bigger sample size and making sure we aren't only getting a minority view. Also make sure, that survey and interview questions are not easily opinionated, you want a firm yes or no, or some sort of quantitative data that can not be easily manipulated, so as to elevate the chance for truth. Once we have this data, we should make sure we manipulate it into databases in the most effective way. We should make sure it is efficient and effective, by incorporating smaller sets of dat that are easy to search through. We should also make sure that accesibility of data is constrained within different groups. This will also allow later studies to not have bias based on seeing other peoples data. All of these things will assist in making extraction of meaningful information, but it is almost impossible to hope that all data will be perfect. There is always going to be people that aren't truthful and are biased. The researcher needs to do his/her best to try their best to get good data that can be easily turned into meaningful information. Also, one of the biggest problems with databases and such is that people make mistakes, once again not a perfect entity and any little mistake can throw off a whole dataset.

Lance Weaverling

Getting meaningful information from data can be a hard task to accomplish. Not only can you have bad data from individuals who contribute false or bad data, but the way the data's interpretted can play a major role in getting meaning from data.When a survey is sent out, or when people contribute data in other ways to a business or other organization, there is no way to tell if the data coming in is honest information, or if it is completely different than the intended information. You can't trust all people to be completely honest with the information they're sending to you. One reason for this false information may be that some people just don't care. If they get a survey that they just don't want to fill out, they just may start answering questions without even caring at all.I believe the other key concern about extracting meaning from data is the group of people interpretting the data. What happens if a group mis-interprets data, and causes a busniess to lose money because of a bad decision. That information gathered wasn't very meaningful at all to the company. Training people to use the data and interpret it may be a way to possibly stop this type on non-meaning to happen.If a business were to hire people to sort through data before it was to be put into a database, the amount of bad data coming in might be decreased. If these people were to be implemented, a business would benefit from only the good data entering their database. If there was no way to filter data before entering the database, who knows how much bad data could make its way into the system, and no business or organization wants to make decisions based on bad data.Overall, there is no way to completely stop bad data from coming in. There will always be those people who purposely supply false information. Training employees to interpret information to use effectively and also training people to look for and get rid of bad data may be key ways to stop bad data from entering a database and more effectively interpretting the right data to get meaningful information from that data.

Nathan Weaver

One problem that can occur with data collection involves sampling variation. Whenever we work with samples selected from populations, a certain amount of random variation is always introduced. This is unavoidable but must be accepted because the variation if random or due to chance. To minimize this problem, larger samples should be studied. The larger the sample size, the smaller the sampling variation. Another way to avoid this could be the total the amount of variation and then decide whether the results appear to be reasonable and useful or not. When trying to extract data there may be some variation in the instruments themselves. It is important to make sure that the same type of instruments was used throughout the study. Proper calibration of machines is absolutely vital before data collection or measurements are begun if machinery of any kind is involved. In order to create meaningful information, it is important to make sure that selection bias has not occurred. This happens when there is a systematic difference between the characteristics of the people selected for the study and the characteristics of those who are not. For example, if a clinical trial is being done to evaluate a medical treatment product and older people are excluded from getting the new drug, selection bias has occurred. Older people may have a poorer prognosis and the effectiveness of the drug may not be truly represented. Accurate data collection also depends on the least amount of human error. Often times variations in judgement occur. To avoid this as much as possible, it seems prudent to assign more than one person to a task. This will allow for a type of group consensus and help avoid errors. In order to create meaningful information, I plan to examine each step involved in the data collection as much as possible. Reliable data interpretation involves having a good collection plan. This should include a description of the project, a clear explanation of the specific data that is needed and a rationale for collecting the data. Each step in the collection process should be explained and easily examined. Procedures for analyzing the data should be spelled out and open to scrutiny by others. Information about the integrity and reliability of the data collectors should be obtained if possible. It is also important that I work carefully and accurately. I plan on incorporating safeguards and checkpoints in all of my work. If all of these steps are followed, my results should be as accurate and reliable as possible.

Ryan Britt

Having the correct information is very important to a company. When a large corporation has thousands of pieces of data in a large DBMS, a lot of the data depends on other data. Therefore, one small mistake could cause a large portion of the data to be wrong. While trying to extract meaningful data, individuals run into many problems. Bias, time constraints, and quality of responses are all problems that workers can run into, as well as when the person collecting data puts his or her own viewpoints into the data being collected or when they have an influence on how the person giving the data will respond. When incentives are involved, the respondent is more likely to give truthful information. If the customer has a large stake in the correctness of the information, for example if they are applying for a home loan or other large loan, they are more likely to be careful when giving their data. If it is for something smaller such as a credit card purchase or something of less importance, they may give false information without trying. Such problems can be avoided by using incentives. Standardized forms, read by a computer, are also more effective in reducing human error. This process requires less human interaction and therefore alleviates the problem arising from a worker reading hundreds or thousands of pieces of data per day. Workers get tired, careless, and pissed off and don't put as much effort into the reading of the data. Machines, on the other hand, can read the data 24 hours a day and still not make mistakes. The only problem that can be made here is when the person entering the data onto the sheet makes a mistake. Data is boring, and nobody likes to handle it, but it is possible to extract meaningful data if the right operations are used.

Mike Hollen

Well since we are speaking about our class project, one thing is that the file that the DB finds may be corrupted. For instance, anyone that has ever used Kazaa, knows that probably 6 out the ten songs you download turn out to be crap.Another problem with DB can behow efficiently the file contained in them are named. For instance, if I want to search for a Britney Spears song, the DB will bring up files with "britney" or "spears" in the file name. But if you have a person that names all of their saved information as "britney spears", the DB will bring up files not pertinent to the user search.To combat this problem, really the only ways are to go through the pieces of information careful. But the best ways are to place integrity constraints on all the files contained in the DB. This is where the "WHERE" command in SQL comes in. It allows us to sift through the wealth of information in the DB (i.e. "I want all Britney Spears songs that are less than 3 minutes"). Other than that, not a whole lot can be done, at least to my knowledge.

Dan Snyder

There are other problems than truthfulness issues in database analysis. Those problems can occur from having non-standardized data. But the thing with non-standardized data is that if it occurs, it's really the fault of the programmer. The database should have everything programed so that when someone would answer a query in a incorrect/nonstantardized manner a default error message should occur. There are still problems extracting meaningful data when everything is correct. That can come from sample sizing, and population of your database. If your database is small, and you extract information from it, your data may be skewed because your pool may not be represent the general population.

dana

When thinking about the question "Can you think of some other problems that may occur when trying to extract meaningful data?" theres a lot that comes to mind. It can be anywhere from how you varied your samples you were extracting to whomeever you were sampling provided false information, didnt complete the survey or there was a time problem. But another problem would be how you target your audience and spread out the surveys. There has to be some sort of random compiling done to get that information so its not biased as well. I also think other problems can be maybe biased questions or sometimes when Im doing a survey i dont fully understand whats being asked and just answer the question. So I think that a lot of planning and breaking down of what exactly you want out of the surveys need to be done to prevent any problems.I dont think there is any way to get perfect date, theres always a margin of error. But planning out your work carefully and generating a populated list will reallyl help cut down the margin of error, but theres no way to truly to eliminate people lying and so forth.

Bilal Zaki

Sometimes people ask the wrong questions to gain specific data on surveys or interviews that gets interpreted differently than what it really meant. Sometimes people also take parts of data leaving out key elements. For example, George W. Bush gave a statement earlier this year talking about terrorist threats, "Our enemies are innovative and resourceful, and so are we," he said. "They never stop thinking about new ways to harm our country and our people, and neither do we." Now really I'm not even sure what he actually meant in that statement, but if some one only heard that part of his speech then of course there would be no way he would get elected again, right? Well his next statement was about never stopping to think about defending this country trying to clarify his other statement. Data is a sensitive thing and it can manipulate the way people make decisions by feeding them data that is true, but is also not complete. The more data there is and the more it is organized the better the outcome of a decision will be. However some data should not be accessible to everyone, although it can be arguable such as the case in the physician example in the reading. One problem about data recording is that sometimes data can change and it could be difficult to apply those changes such as a change of address, phone number, or name. To extract meaningful data you must make sure asking the right questions for the data you are trying to obtain, make sure you include all the facts, and organize everything properly.

Rich

People are always going to give bias respones to questions. Depending on the mode they are in, if they have had a good day or a bad day, married or un married, guy or girl, all of these play a role in the response of the person. With all of the new emerging tecnology, I feel that it should make it eaisier and more difficult to collect accurate data. Eaiser in the way that more people can be reached in less time and the researcher can make it very easy for the person to resopnd. Harder in the way that the study will be less personalized and it is much eaiser for the resondent to just click through buttons really fast to get done with it without ever really reading or understanding. There is no real way to solve this problem. If you get a researcher who is really excited about the research, they might be bias towards the results, and if you get people to respond who you already know the answer, then there is no point in the study. It is tough to say how I will find a way to get meaningful data, I think that I would always use a large study group. That way the more people resonding the better idea of which way to go you have. The only problem is that the lager the group, the larger margin of error your study has.

Michelle

Extracting meaningful data can become a difficult task, especially when you are unsure of the accuracy of the data. Data accuracy can be influenced by factors relating to how the date was collected. If in a survey format, people are usually uninfluenced by others and have a higher probability of telling the truth. Whereas in a focus group or other types like that, people are usually influenced by the presence of their peers. This can lead to inaccuracies, but the interviewer should be able to distinguish accurate information from that of people just going along with the group. Some other possibilities of obtaining accurate information are:1. People do not understand the questions being asked like in a survey or do not understand what kind of response to give.If people do not understand what is being asked of them, they most likely will give inaccurate information that will not pertain to the question asked. If this information is then used when compiling the data, the results will be inaccurate.2. People might not want to participate in offering up information about themselves and their habits. If the surveying group is unable to obtain enough people to participate in the surveying process or other types of informational gathering, the data will be inaccurate. The results will be skewed in one direction instead of being a total representation of a population. This will yield inaccurate information to the database, if not taken into account when entering the data or performing statistical analysis on the data. To extract meaningful data, researchers need to be careful of who they are receiving information from. If they do not take into account the above factors or the factors listed in the question, then data will be inaccurate. If people do not understand the questions, then time needs to be taken to rewrite or rephrase questions so the majority of the population can understand them. If there is not enough people willing to participate in information giving, proper statistical analysis’s need to be take into account. I think if information gathering is done properly, data can be accurate. There are just a multitude of factors that one needs to be aware of and take into account.

Betsy

When gathering data from interviews, reports, journals, or books it’s difficult to know whether the data is accurate or if it’s a made up story. As a knowledge worker it makes our job of collecting data timely and complicated. We have to be able to sort through data determining what is false and what is meaningful.Data is said to be only as valuable as our ability to access and extract meaning from it; and we cannot extract meaning from it without organizing, storing, and analyzing it effectively. When looking at data we have to consider many things such as; the respondents' truthfulness, the quantity of responses, possible bias, and time constraints. By looking at these possibilities it enables us as knowledge workers to pick through false data and determine what is accurate. When looking at data the knowledge worker has to be informed about what their looking for. By having a good, solid understanding of the data their looking at, they’ll have a better understanding of what is accurate. Also by gathering as much information and from many different reasources can help you determine if there is a false pattern in the data. When looking at data such as surveys you need to know exactly how the survey was administered to determine if their were any biases or falselyhoods. Each type of data needs to be organized and researched throughly as to how it was administered, to who wrote it (reports, journals, etc.), to the time in which it was taken. By evaluting each piece of the research and knowledge worker can determine how accurate the data is.Gathering data can be a huge task, mainly due to the fact that there is so much false information in the world because of simple human errors to mechanical computer errors. As knowledge workers it is our job when doing research to make sure that the information we’re using is accurate. In order to that you have to be willing to complete the steps to extracting the meaning from the data you have.

jon yen

Being an economist, we deal with statistics everyday. Most of our data relies on a density of information that we use to compile results. But with information you will always have a mean, and deviation from the mean. What we want to do is to get enough participants so that we can narrow this deviation from the mean. An example could be a coin flip. This simple measure says that there is a fifty percent chance of flipping a coin and getting either heads or tails as a result. However, if we flip this coin 5 times, we may encounter all 5 times being heads or tails. A better population measure is to flip the coin a hundred times. Probability never fails and we will encounter a percentage close to fifty percent maybe not exact. Another problem that could occur is externalities. Surveys could differ in demographics, such as: region you live in, sex, age, heritage. These variables can all alter the way a survey is conducted. Let’s take State College. A question can be conducted, “Do you think illegal file sharing is ok?” We may get a sample of eighty participants and eighty percent say yes. However if we were to conduct a survey in Washington DC, we may get the reverse results. Externalities such as time constraints, truthfulness will happen. But the larger the sample the less these abnormalities will become a problem. Another resolution is with a statistical method called regression. An example, we want to find the reason MLB players are being paid so high. So we look at data such as their points, but is this valid? Points has a positive correlation with salary? These maybe true, but other factors such as rebounds, steals, and other statistics have an influence. Other factors such as draft pick, family, race, size, jail time, and intelligence, can all have influence on their pay. With regression, we combine all this information and can see how much each influence has on their salary. We can for instance find out that scoring does have a sixty percent influence on the pay. This means that it is a huge impact on salary but doesn’t explain everything. On the other end, jail time is placed into the variables and we find out that it has a .008% influence on the salary. The weight is so small that we can actually not include this into factors that determine salary. Regression is a method useful for determining valid data from the stuff that doesn’t really matter.

The comments to this entry are closed.

Categories