The Fed "skipped" the interest rate hike as scheduled, suggesting that it will raise interest rates twice this year.

Reporter Cui Puyu

After raising interest rates 10 times in a row, the Federal Reserve announced on Thursday that it would keep the federal funds rate unchanged in the range of 5%-5.25%, but at the same time said that if the economy and inflation do not cool down further, they will tend to raise interest rates later this year.

Most of them expect to raise interest rates twice this year, and have raised their expectations for growth and inflation in their economic forecasts.

"Keeping the target range unchanged at this meeting will give Committee members time to evaluate more information," the policy statement issued after the meeting read. The interest rate decision was unanimously passed by Fed officials.

At the press conference held after the meeting, Federal Reserve Chairman Powell pointed out that almost all Fed officials believe that it is appropriate to "further" raise interest rates in 2023 to reduce inflation. However, he did not disclose whether the next rate hike will be at the meeting in July.

"Inflation pressure remains high, and there is still a long way to go to bring the inflation rate down to 2%," Powell said. At the same time, he also pointed out that considering the previous rate hike, the Committee believes that it is prudent to keep interest rates unchanged this month, and the suspension is a continuation of the slowdown in policy measures.

"We have recovered a lot of lost land, but the impact of the austerity policy has not yet been fully realized," Powell said.

Since March 2022, the Federal Reserve has implemented the fastest series of interest rate hikes since the 1980s, raising the federal funds rate by 5 percentage points. This year, the central bank slowed down the pace of raising interest rates, raising interest rates by 25 basis points at each of the past three meetings, and the last time was in May.

Previously, it was widely expected that the Fed would "skip" this meeting. Officials prefer the word to "pause" because "pause" means that interest rates may remain unchanged for a long time.

Powell and Philip Jefferson, vice chairman of the Federal Reserve, hinted in their recent speeches that they would stay put at this meeting to give policymakers more time to evaluate the impact of previous interest rate policies and banking pressure.

What is surprising is the "bitmap" of the interest rate path released this time. The figure shows that policymakers expect the median policy interest rate to rise to 5.6% by the end of the year, compared with the forecast of 5.1% in March.

Among the 18 decision makers, 12 people expect the interest rate to be in or above the median range of 5.5%-5.75%. If the interest rate is raised by 25 basis points each time, it means that the interest rate will be raised twice in the remaining four meetings this year. Of the remaining six people, two are not expected to raise interest rates this year, and four are expected to raise interest rates once.

Policymakers have also raised their interest rate expectations for the next few years. It is estimated that the federal funds rate will be 4.6% and 3.4% in 2024 and 2025, respectively, higher than the 4.3% and 3.1% predicted in March.

However, expectations about the future also show that if the outlook for this year remains unchanged, the Fed will cut interest rates by a full percentage point next year. The long-term expectation of the federal funds rate remains at 2.5%.

At the same time, Fed officials raised their economic growth forecast this year from 0.4% in March to 1%. Officials are also more optimistic about the job market this year, and the unemployment rate is expected to be 4.1% by the end of the year, compared with the forecast of 4.5% in March.

Powell pointed out that the current labor market is still very tight. According to the latest data, 339,000 new jobs were created in the United States in May, but the number of job vacancies remained high.

On inflation, Fed officials raised their expectations for core inflation (excluding food and energy) to 3.9%, and slightly lowered their expectations for overall inflation to 3.2%. The expectations in March were 3.6% and 3.3% respectively.

The inflation rate has dropped from last year’s peak, but it is still significantly higher than the Fed’s target level of 2%. The inflation indicator favored by the central bank, the personal consumption expenditure price index (PCE) rose by 4.4% in April, and the core PCE increased by 4.7%.

Another inflation index also shows encouraging signs of slowing down. The consumer price index (CPI) increased by 4% year-on-year in May, the lowest level in more than two years. However, the core CPI rose by 5.3% year-on-year, slightly higher than expected.

Powell said that policymakers still believe that inflation risks tend to go up, but risks that are done too little or too much are "approaching equilibrium"; Once inflation drops significantly, it may "take several years" to cut interest rates.

Affected by the hawkish signal released by the Federal Reserve, the three major US stock indexes all fell in intraday trading, but stabilized after Powell made a statement that no decision had been made on the July meeting. At the close of the day, the Dow Jones index fell 0.68%, or 232.79 points, to 33,979.33 points. The S&P 500 index rose 0.08% to 4,372.59. The Nasdaq index rose 0.39% to 13,626.48.

Reporting/feedback

Iran’s latest countermeasures expose the tension caused by the constant "extreme pressure" of the United States.

       CCTV News:Last year, the United States unilaterally withdrew from the comprehensive agreement on Iran’s nuclear issue, restarted sanctions against Iran, and recently increased its "extreme pressure" on Iran — — Faced with the aggressiveness of the United States, Iran has recently taken many measures to "defend against" the pressure from the United States.

       After the United States officially announced its withdrawal from the Iranian nuclear deal last year, it resumed economic sanctions against Iran. Among them, the sanctions against Iranian oil exports directly hit the lifeline of Iran’s economy. At the same time, the United States has continuously increased its pressure on Iran in other aspects, including classifying Iran’s Islamic Revolutionary Guard as a terrorist organization and sending more troops to the Middle East. In addition, after the recent oil tanker attacks in the Gulf region, the regional situation has become more complicated.

one

       On May 8 this year, on the first anniversary of the unilateral withdrawal of the United States from the Iranian nuclear deal, Iranian President Rouhani announced that he would suspend the implementation of some provisions of the Iranian nuclear deal and would not sell heavy water and enriched uranium to foreign countries. Iran hopes to negotiate Iran’s rights and interests in the agreement with other signatories in the Iranian nuclear deal within 60 days. If the demands are not met, Iran will no longer limit the product abundance of its uranium enrichment activities.

       On the 17th of this month, the Iranian Atomic Energy Organization issued a statement, announcing that Iran will break through the upper limit of 300 kilograms of low-enriched uranium by June 27th. According to the Iranian Atomic Energy Organization, Iran has increased its enriched uranium production capacity, but its products remain low in abundance. Analysts believe that the Iranian breakthrough in the Iranian nuclear deal is a new step, which is the latest counter-measure to constantly "exert extreme pressure" on the United States. Iran made this move at a time when the US-Iran relationship continues to be tense and the regional situation is complicated and changeable, probably to increase the bargaining chip with the United States.

"There is nothing that can’t be solved in a hot pot" special edition, and all the staff are amazed.


1905 movie network news On March 29th, the suspense comedy released a special edition of "Surprise", which gathered all kinds of wonderful inside stories of "Surprise for Heaven" that were surprising, surprising, amazing and frightening. For the first time, the special edition exposed dirty makeup models, and a group of people or masks, masks, or hanging colors, all of which obscured their true looks, which made people wonder what kind of drama would happen between them.


The movie "Nothing can’t be solved in a hot pot" tells an absurd story full of suspense and joy: four strangers share the spoils in the backstage warehouse of the theater, and they are unexpectedly involved in a murder case. A hot pot in YOLO rolls with endless suspense, and greed and deception lead to a series of reversals. Finally, the special identity of the four people gradually surfaced, and the mysterious truth was also coming out … …


Layers of baggage! For a hot pot, Yang Mi was "black-faced" on the spot and Yu Qian almost "sacrificed"


In the special edition, Yang Mi said: "This is a very interesting suspense film with black humor!" It seems to be in line with her words, and then she will be "black face" all the time, wearing a thick black mask. Yu Ailei, who was opposite her, sincerely praised "Applause for you". It can be seen that Yang Mi’s performance was so subversive that her partners in the play such as Yu Qian, Yu Ailei and Li Jiuxiao had no expectation of it, and she was very happy to create a completely different role from before, because in her opinion, actors should try all kinds of challenges in order to constantly surprise the audience.


If Yang Mi is a "black face" in the whole process, then Yu Qian is a "green face" in the whole process. In the special edition, Yu Qian was severely splashed with green paint. In this regard, Yu Qian said with a lingering fear: "I didn’t sacrifice a lot for this play, but almost sacrificed!" And this is not Yu Qian’s biggest contribution in this movie. It is said that Yu Qian has many action scenes in this movie, whether it is beating people or being beaten, it can be said that it is one of the movies with the most action scenes since Yu Qian started filming. This desperate performance of "regardless of sacrifice" makes people want to know what kind of performance Yu Qian can contribute in the film.


Many surprises! Forced to "show meat" Yu Ailei Li Jiuxiao was shocked to "lose his voice"


The director said: "The whole creative process is full of novelty and fun." Yang Mi added: "It is full of imagination." It can be seen that everyone is very confident about the story. Tian Yu spoke out: "The director’s idea is very strange!" Why do you say that? Because the director seems to want Tian Yu to show more meat, Tian Yuju objected with both hands: "I have exposed it, and it can be ugly!" As a result of the angry words between the two sides, Tian Yu appeared in a very festive red autumn clothes and trousers costume, which immediately triggered thunderous applause and laughter from the audience.


Director Ding Sheng asked several leading actors to let go of their acting skills, trying to make every shot have an amazing pen. For such a creative process, Li Jiuxiao said: "I feel very exciting, and everything is different from what we thought before!" This leads to many times, people’s performance is not like acting. For example, Yu Ailei suddenly strikes the table, Yu Qian’s whole body is tingling, and Tian Yu is even more scared of the lines that must be silenced; And Li Jiuxiao’s one-way output also scared Yu Ailei to lose his voice, and repeatedly lamented that "the fate is ups and downs, and it keeps reversing!"


The suspense comedy "Nothing can’t be solved with a hot pot" will be released on May 1, so stay tuned!


World Hypertension Day | Are more and more young people suffering from hypertension? Beware of high blood pressure "will be invisible"

CCTV News:When it comes to hypertension, many people used to think that it was a geriatric disease, but now the incidence of hypertension is obviously younger. The survey shows that in China, the prevalence rate of hypertension reaches 15% among people aged 35 to 44. Why are more and more young people suffering from high blood pressure?

Hypertension has obvious genetic tendency. According to research, if both parents have hypertension or one parent has hypertension, the incidence of children is 46% and 24%. When there are genetic factors, if we don’t pay attention to the influence of environmental diet on blood pressure, it will lead to hypertension.

Jiang Xiongjing, Deputy Director of Vascular Center of Fuwai Hospital of China Academy of Medical Sciences:Nowadays, young people, for example, have a lot of work pressure and rich nightlife, or their diet is unhealthy. Originally, their parents had high blood pressure, and he (originally) might get sick in his fifties and sixties, and he might start to get sick in his thirties.

Experts remind that high-salt diet, long-term excessive drinking, smoking, obesity, lack of physical activity, long-term mental stress and poor sleep are all high-risk groups of hypertension, and blood pressure should be measured frequently.

Jiang Xiongjing, Deputy Director of Vascular Center of Fuwai Hospital of China Academy of Medical Sciences:Young people’s hypertension really can’t come down after being discovered, so they should take medicine as soon as possible. This will protect his cardiovascular system, that is, let him not suffer from more serious diseases, such as myocardial infarction, stroke and aortic rupture, which can be avoided as much as possible.

Zibo area Beijing BJ60 price reduction information! 30,000, not to be missed

Welcome to [Autohome Zibo Discount Promotion Channel] to bring you the latest car market trends. At present, high-profile models are launching an unprecedented promotion in Zibo. As a popular SUV, the BJ60 is now attracting car buyers with a cash discount of up to 30,000 yuan, and the minimum selling price has been adjusted to a very competitive 234,800 yuan. This undoubtedly provides consumers with an excellent time to buy a car. To seize this price reduction opportunity, get more detailed preferential information and real-time pickup price, please click "Check the car price" in the quotation form, and let professional consultants help you get the maximum car purchase discount.

淄博地区北京BJ60降价信息!优惠3万,不容错过

The exterior design of the Beijing BJ60 highlights the hardcore off-road style. The front face adopts a large family-style air intake grille, with rough chrome-plated trim, showing strength and stability. The body lines are smooth, angular, and the overall style is both functional and aesthetic, which leaves a deep impression.

淄博地区北京BJ60降价信息!优惠3万,不容错过

The Beijing BJ60 shows a hard and delicate side line design. The body length reaches 5040mm, the width is 1955mm, the height is 1925mm, and the wheelbase is 2820mm. This size makes it visually spacious and stable. The balanced layout of the front and rear wheel bases of 1620mm provides the vehicle with stable driving performance. The tire size is 265/65 R18, and it is matched with a delicate wheel rim design, which not only enhances the vehicle’s sense of mobility, but also ensures the grip and comfort of driving.

淄博地区北京BJ60降价信息!优惠3万,不容错过

The interior design of the Beijing BJ60 is refined and practical, combining luxury and functionality. The steering wheel wrapped in leather provides a good grip and comfort, and can be manually adjusted up and down and front and rear to ensure the best operating space for the driver. A 12.8-inch large-size touch screen stands on the center console, with clear display effects and rich infotainment functions. It supports automatic speech recognition control, which is convenient for users to operate. As for the seats, the front seats are equipped with imitation leather material, which can realize 4-way adjustment, including front and rear, backrest, high and low and waist support. It also provides heating, ventilation and massage functions, providing a high-level comfort experience for the occupants. The driver’s seat also has an electric memory function, which is convenient for the driver to quickly adjust when different driving needs. In addition, the rear seats support backrest adjustment and can be reclined proportionally to provide flexible loading space. The overall interior design shows the high-end positioning and user-friendly consideration of the BJ60.

淄博地区北京BJ60降价信息!优惠3万,不容错过

The Beijing BJ60 is equipped with a 2.0T turbocharged engine with a maximum power of 120 kilowatts, capable of delivering a peak torque of 400 Nm. The engine generates 163 horsepower and is paired with an 8-speed manual transmission to ensure smooth power transmission and driving experience.

To sum up, Beijing BJ60 has won the approval of car owners @Autohome owners with its atmospheric and calm exterior design. Although the evaluation of the exterior style varies, this does not prevent it from leaving a deep impression in the hearts of some users. Everyone’s aesthetics and needs are different. As @Chicken Thief 2014 said, "Radish and green vegetables, each has its own love", Beijing BJ60 is obviously satisfying some car owners’ pursuit of classics and practicality.

There is an IPO in the "gutter oil": the couple of Fengbei biological controllers have "cashed out" more than 50 million in five years.

May 24, 2023, Shanghai, 2023 China International Agrochemicals and Plant Protection Exhibition, Fengbei Bio-booth. People’s visual data map

The once-shocking waste oil has been made into an "environmental good business" by this company, which is not only exported overseas, but also plans to be listed on the A-share main board.

Recently, the initial IPO application of Suzhou Fengbei Biotechnology Co., Ltd. (referred to as "Fengbei Bio") has been accepted by Shanghai Stock Exchange. This is a high-tech enterprise in the field of comprehensive utilization of waste resources, mainly producing resource products from waste oil. Waste oils and fats refer to animal and vegetable oils and various by-products and scraps of oils and fats that do not meet the edible standards and are produced in the process of catering service, food processing industry, oil refining and oil storage, and then become waste oil after flowing into sewers.

According to the prospectus, biodiesel is the best flow direction of waste oil at present. Biodiesel has low requirements on the index of raw material waste oil, covering almost all kinds of waste oil. Besides being used as biofuel, biodiesel can also be used to produce bio-based materials, with broad application prospects and strong economic benefits.

The fundraising project has been put into production.

According to the data in the prospectus, from 2020 to 2022 (referred to as the "reporting period"), the operating income of Fengbei Bio was 790 million yuan, 1.296 billion yuan and 1.701 billion yuan respectively, and the net profit attributable to the owners of the parent company was 48.9986 million yuan, 102 million yuan and 133 million yuan respectively.

Fengbei Bio’s main business income is mainly the comprehensive utilization of waste oil resources, supplemented by oil chemicals business. During the reporting period, the sales revenue of the comprehensive utilization of waste oil resources accounted for 69.93%, 77.81% and 79.53% of the main business income, respectively.

In addition, during the reporting period, the overseas sales revenue of Fengbei Bio was 163 million yuan, 446 million yuan and 672 million yuan respectively, accounting for 20.66%, 34.50% and 39.41% of the main business income respectively. Fengbei Bio’s export products are mainly used in the European market.

Fengbei Bio said in the prospectus that only a few large-scale enterprises in China have the ability to treat low-quality waste oil. For example, Zhuoyue Xinneng (688196.SH) is the largest biodiesel enterprise with the largest export volume in China, and its existing biodiesel production capacity is 500,000 tons. Jiaao Environmental Protection Co., Ltd. (603822.SH) is a large-scale and influential biodiesel producer with product quality conforming to EU EN14214 standard. Its existing biodiesel production capacity is 300,000 tons. The main products of Longhai Bio (836344.NQ) are biodiesel and plant asphalt, and the existing production capacity of biodiesel is 60,000 tons. At present, Fengbei Biodiesel has an existing production capacity of 90,000 tons and a production capacity of 350,000 tons under construction.

According to the data disclosed by Biofuels Annual-China, from 2020 to 2022, the output of biodiesel in China was about 1.28 million tons, 1.61 million tons and 2.14 million tons, while the output of abundant biomass was 50,000 tons, 75,000 tons and 90,000 tons in the same period, corresponding to the company’s biodiesel market share of about 3.91%, 4.66% and 4.21%. Fengbei Bio claims to be in the first echelon of the comprehensive utilization industry of waste oil resources in China.

It is reported that Fengbei Bio intends to raise 1,000,000 yuan in this IPO, which is mainly used to invest in "projects with an annual output of 300,000 tons of methyl oleate, 10,000 tons of industrial-grade mixed oil, 50,000 tons of agricultural microbial agents, 10,000 tons of compound microbial fertilizers and by-products of biodiesel 50,000 tons and glycerol 82,000 tons". In the future, with the investment projects raised by the funds put into production, the biological yield of Fengbei is expected to increase significantly. According to the data in the prospectus, according to the feasibility study report, on the premise that all economic factors are in line with the expectations of the feasibility study report, the company expects to increase its operating income by 3.944 billion yuan (excluding tax) every year after the project is fully put into production.

However, we need to be alert to the risks brought by overcapacity in the industry. Fengbei Bio emphasized that as several mainstream biodiesel enterprises have successively disclosed large-scale plans for biodiesel production capacity expansion, with the gradual implementation of new production capacity, there may be an overall imbalance between supply and demand in the future due to accelerated production capacity or production capacity exceeding expectations, which will lead to intensified market competition.

Before IPO, "cash out" exceeded 50 million yuan.

It is worth noting that before the IPO, Pingyuan, the actual controller of the company, and his spouse Han Linlin cashed in more than 50 million yuan by selling the equity of the controlling company to Fengbei Bio and paying dividends.

Pingyuan was founded in July 2014, and its predecessor was Fengbei Limited, which was positioned as the main body of research and development and mainly engaged in oil and fat chemicals related business; In the same period, Pingyuan developed the comprehensive utilization business of waste oil resources through its friends, Grease and Weige Bio, controlled by Han Linlin or his spouse, while Fuzhiyuan also engaged in the oil chemicals business like Fengbei Limited.

Before being acquired, Vig Bio, Fortune Source and Liangyou Oil were all controlled by Pingyuan or its Han Linlin. However, in 2018, Fengbei Bio said that in order to reduce related party transactions and horizontal competition, and further enhance business synergy, it announced the acquisition of all the shares of Weige Bio, Liangyou Grease and Fuzhiyuan. 

According to the information in the prospectus, before the reorganization, Han Linlin and Wei Liang (son of director Wei Guoqing) held 70% and 30% of the shares of Weige Bio respectively. In December 2018, Han Linlin transferred 100% equity of Weige Bio to Fengbei Co., Ltd. for 33.68 million yuan and Wei Liang for 14.4 million yuan. 

Pingyuan and Han Linlin hold 80% and 20% of the shares of Fuzhiyuan respectively. In December 2018, Pingyuan transferred its 100% equity of Fuzhiyuan to Fengbei Co., Ltd. at a price of 4,073,800 yuan and Han Linlin at a price of 1,018,500 yuan. 

Li Yin and Wei Liang hold 70% and 30% shares of Liangyou Oil respectively. Among them, the equity held by Li Yin was entrusted by Pingyuan for management reasons. In January 2019, Li Yin transferred 100% equity of Liangyou Grease held by Li Yin to Weige Bio at a price of 700,000 yuan and Wei Liang at a price of 300,000 yuan. 

After this time, Pingyuan and Han Linlin successfully cashed in 39.4723 million yuan.

In addition, it is worth noting that Fengbei Bio also paid a large dividend before the IPO. On March 4, 2022, Fengbei Limited held the first shareholders’ meeting in 2022, and deliberated and passed the proposal on profit distribution, with a total profit of 15 million yuan. At that time, Pingyuan directly and indirectly controlled 85.40% of the shares of Fengbei Bio, and the dividend amount was about 12.75 million yuan.

That is to say, before IPO, Pingyuan and Han Linlin made equity transfer and cash dividend, and the cash amount reached 52,222,300 yuan.

The asset-liability ratio is high

It is worth noting that the combined asset-liability ratio of Fengbei Bio is 34.67%, 36.20% and 45.25% respectively, showing the status quo of rising year after year. In this regard, Fengbei Bio said that the increase in asset-liability ratio during the reporting period was mainly due to the company’s new long-term loans for project construction investment in 2022, and the company’s asset-liability ratio remained at a reasonable level during the reporting period.

Fengbei Bio said that after the raised funds are in place, the company’s total assets and owner’s equity will increase substantially, and the level of asset-liability ratio will decrease, which will help improve the company’s debt financing ability, optimize the company’s capital structure and enhance its ability to prevent financial risks.

It is worth mentioning that Pingyuan, the actual controller and chairman of Fengbei Bio, directly held 59.78% of the shares, indirectly controlled 16.94% and 8.68% of the shares through Zhonghe Business and Fubei Huiying, and controlled 85.40% of the shares of the company in total. After this issuance, Pingyuan is still in an absolute holding position, which can have a significant impact on the company’s production and operation decisions.

Fengbei Bio’s external financing activities are not active. Before the IPO, only two external institutions were introduced. In December 2019, Yida Capital invested 20 million yuan through Yuquan Yida and 10 million yuan through Yangzhong Yida respectively, with an average holding cost of 3.48 yuan per share.

In December 2020, Shanghai Zhishi Enterprise Management Consulting Partnership (Limited Partnership) (referred to as "Shanghai Zhishi") subscribed for part of the registered capital with an average holding cost of 3.78 yuan per share. Between the two financing, the post-investment valuation of Fengbei Bio has increased from 370 million yuan to 400 million yuan, and there has not been much change.

According to official website of Yida Capital, Yida Capital was established by the internal mixed ownership reform of Jiangsu High-tech Investment Group, a well-known venture capital institution. Before the final IPO, Yida Capital held 5.34% of shares through Yuquan Yida and 2.67% of shares in Yangzhong Yida. Shanghai Yishi’s shareholding ratio is 4.92%.

Interview with Blue Whale | Guo Tingli, General Manager of New Station Insurance Network: Refined to the service precipitation of discharge summary differences

As the result of fine division of labor in the insurance market, insurance intermediaries have rapidly formed a strong growth market since their appearance. Crowd mode, separation of production and marketing, independent agents, high quality, service advantages, etc., there are many keywords about insurance intermediary track, which are connected with the exploration and thinking of insurance intermediary industry practitioners. Recently, Blue Whale Insurance interviewed Guo Tingli, general manager of the new station insurance network, to discuss topics such as internet plus insurance, customer acquisition and service value.

In 2011, Xinyizhan Insurance Agency Co., Ltd. (hereinafter referred to as "Xinyizhan") was established. In August of the following year, Xinyizhan Insurance Network began to operate. In 2016, Xinyizhan was listed on the New Third Board. As a national professional insurance agency, it provides insurance products and services to users with multiple terminals of Xinyizhan Insurance Network.

Online and offline are just means, and executives should consider the efficiency and cost behind the means.

Starting with a technology company, the starting point of the new station lies in the recognition of the value of insurance products and services. Guo Tingli believes that insurance provides the new station with the value and expectation precipitated in the long-term service field.

Internet gene is one of the labels of the new station. Relying on shareholder focus technology, the new station insurance network is the core of business. However, Guo Tingli does not fully agree with this statement of "Internet insurance platform".

"The significance of the Internet platform lies more in tools to help salespeople provide better services to customers, but the new stop is to stand on the track of insurance sales and provide professional services to customers with’ technology+professionals’."

"From the perspective of insurance, online and offline cannot be completely separated. Consumers know about products online and sign orders online; Interact with insurance salespeople offline, sign bills online, and return to offline claims. So how do you define this as online behavior or offline behavior? " This is a problem that Guo Tingli threw back to the industry.

"From the perspective of management, the core action of sales is to complete insurance, and how to complete insurance is the form." Guo Tingli pointed out that it is actually difficult to distinguish between online and offline steps. It is of limited significance to discuss this issue. As an executor, it is important to judge which form is cheaper and more efficient.

"I don’t care which part of the whole process is completed online and which part is completed offline. What I care about is how the customer finally finished it in a few steps, how it feels, and how we can connect all the steps in series with technology."

"Xinyi Station is a company that provides insurance sales services with technology," said Guo Tingli.

Guo Tingli, General Manager of New Station Insurance Network

There is still no regular pattern of traffic transforming into continuous consumption customers.

Insurance is a market revolving around "people", and the first step to generate interaction must be to obtain customers.

Getting customers is not just about getting in touch with customers, or simply reaching a deal. Before Internet companies entered the insurance market, they had their own traffic advantages, huge customer matrix, high-quality streaming media advertisements and consumption landing platforms. Guo Tingli pointed out, however, it is not easy for consumers on the platform to take root and bear fruit in the insurance field and make continuous consumption. So far, there has not been a regular transformation model.

"This requires further precision matching of products and service execution to drive sales results through these two links. Once the driving force is formed and the whole model is rolled up, it will be a very solid achievement. "Guo Tingli said, on the basis of customer trust, when he needs short-term protection, he will immediately find us to recommend, make decisions with little screening, or even if he doesn’t spend money at a new stop, he can come for multiple consultations, and the customer will eventually land at a new stop.

At present, the combination of two-wheel drive, a new stop on the family target.

"The direction of the new station is becoming clearer and clearer, that is, around the field of family services, in the insurance matching and service links, to achieve the ultimate."

Guo Tingli specifically analyzed to Blue Whale Insurance that when the family is cut open, it is an orderly organization with human factors and ties. After the family is modeled, each member of the family has his own identity and characteristics, and members with different identity characteristics need to match different guarantees.

From Huimin’s insurance for family members, to the fact that medical insurance cards can be used by parents simultaneously in some areas, the radiation and sharing of family-based protection is advancing, which is the basic logic for Guo Tingli’s new station to provide comprehensive protection targets with family as the target in the future.

Lack of "aggressive" service: refinement to the sample database of differences in doctor’s discharge summary

In recent years, it is difficult to break the problem of homogenization of products in the industry as soon as possible. All insurance institutions aim at back-end services and land softly among consumers with "temperature".

"Compared with sales, the service sounds less aggressive, and consumers are more willing to listen to what services you can provide him, rather than what products you want to sell him." Guo Tingli analyzed to Blue Whale Insurance with serious illness claims. For insurance institutions, it is necessary to equip their own professional claims team with knowledge of medicine, medical insurance policies and reimbursement rules.

But these are not enough, further specialization, where is it reflected? Guo Tingli takes Nanjing, one of the key cities in the new station, as an example. The local service staff will understand in detail the different identifications of the same disease by different doctors, the differences in the formats of discharge summaries made by different doctors, etc., and enrich the sample database by obtaining a large number of cases, so as to achieve intensive cultivation.

When facing customers all over the country, simple case accumulation is not applicable, so we need to sum up the rules. Guo Tingli put forward the value of science and technology and Internet technology at this time. "The idea is: I don’t need the staff to be equipped with very specialized knowledge, but he must have the system under the network topology to refine knowledge and solve problems through my system. Through the technical end, the claims data are sorted out clearly enough to help relevant personnel speed up. "

Service is to do everything in every link solidly, to do what customers really care about, and to precipitate and carry it professionally.

It is difficult to separate production from sales: one of the reasons is that the main company in the industry is "not confident"

At the end of 2020, China Banking and Insurance Regulatory Commission officially issued the Notice on Matters Related to the Development of Independent Personal Insurance Agents, which is different from the traditional insurance company agent model. Personal agents are regarded as an important solution model to change the hierarchical relationship and professional problems that have been criticized for a long time. Since the beginning of this year, some insurance companies have started the independent agent model and launched the talent introduction and cultivation plan.

Based on this, Blue Whale Insurance consulted Guo Tingli from the perspective of insurance agency in an exclusive interview.

"The industry has called for the separation of production and marketing for a long time, but it has been inseparable, partly because the main companies in the industry lack a clear understanding of the market." Guo Tingli thinks this is a kind of "unconfidence", and the company is worried that once it leaves the sales link, it will face problems such as product loss and service failure.

Therefore, the long-term phenomenon in the industry is that channels are king and channels customize products.

Guo Tingli regards independent agents as a radical way to make each independent agent become an image ambassador in the market, enhance professionalism in insurance, medical care, finance and other fields, and form a collection. In the end, when this aggregate can talk with insurance companies on a horizontal level, not just the sales execution force downstream of the insurance industry, the separation of production and marketing will become the natural result of the market.

This process may not be short, but from a long-term perspective, it covers the entire insurance channel perspective such as individual insurance, bancassurance and Internet. Guo Tingli believes that there may still be a relatively centralized single channel, but the whole market structure will surely form higher and higher dispersion, and there will be more small and beautiful teams.

Reporting/feedback

5G melts the media to refresh the communication speed, and Central Video teamed up with China Telecom to open up a new horizon for the Asian Games.

  The first Asian Games with "China Characteristics, Asian Style and Brilliance" was successfully concluded a few days ago. During the 16-day "Asian Games Time", CCTV acted as the rights-holding broadcaster of the 5G new media platform of CCTV and the new media platform of Hangzhou Asian Games.It not only carries the live broadcast content of all the games, but also launches the 5G Asian Games Daily with China Telecom’s innovation, which builds an all-round "companion" and "immersive" watching experience, leaving the audience with multiple and three-dimensional memories of the Hangzhou Asian Games.

  During the Hangzhou Asian Games, the total broadcast volume of central video events reached a new high, with 810 live events, and the daily newspaper of the Asian Games with 5G media was also popular, reaching more than 28 million telecom users with an average daily click-through rate of 7.4%, which became an important window for users to enjoy the sports feast, feel the spirit of the Asian Games and expand the horizon of the Asian Games.

  The "Rigeng" 5G Media Asian Games Daily has restored the excitement of the stadium from multiple dimensions.— — The main contents of the daily newspaper are schedule forecast, exciting moments, highlights of Asian Games athletes, etc., and the contents such as medal list, schedule list, Asian Games special area, CCTV video member and wing lighting are added, and the official website quick entrance of CCTV video mobile phone, the operation and maintenance of massive events are refined, and the information is reported at your fingertips, which meets the individual, timely and interactive needs of Internet users.

(5G message page)

(Look at the page of Asian Games Daily)

(Asian Games Daily Medal List Page)

  As the official partner of Hangzhou Asian Games, China Telecom escorted the Asian Games with its powerful cloud network capability, combined with comprehensive network advantages and service advantages to accelerate the integration of 5G+AI, and applied digital technology to the Asian Games venue management, event experience, media communication, daily training and other scenes to ensure the smart Asian Games from all directions and angles.The "Brand Interaction" area of the Central Video 5G Media-integrated Asian Games Daily shows China Telecom’s black technology of the Asian Games through multi-dimensional reporting, and truly connects brand communication with 5G news media-integrated reporting.

(Asian Games Daily China Telecom Brand Interactive Zone)

  With sports as the media, the broadcast of sports events and innovative technology are stimulating new chemical reactions.Taking advantage of the Asian Games, the overall spread volume of CCTV and China Telecom further expanded, forming a unique commercialization scale, enriching the commercialization landing scene of 5G messages, and realizing the service marketing solution of providing customized message content for industry customers.

  At present, CCTV’s own 5G message platform has been officially launched, which can accurately send 5G messages to selected people and open up a new content dissemination channel for CCTV.By building a 5G message platform, CCTV video has aggregated the powerful effect of "5G message platform+users of the whole network+hot content", which greatly enriched the application scenarios of 5G messages in the media industry and brought immersive events and hot interactive experiences to users. At the same time, it plays a channel role of 5G messages, realizes the revolutionary upgrade of traditional media, and effectively enhances the market competitiveness of media services.The main station plays a leading role in the mainstream media, develops 5G media integration services, and plays a demonstration role by deploying infrastructure and combining customer application platforms, which has a wide range of application scenarios and remarkable social and economic benefits.

Why is the network "social cow" and the reality "social fear" so?

Some young people laugh at the network "social cattle" and the reality "social fear". Why is there such social duality?

In the digital living space, what kind of alienation has taken place between social rules and communication logic?

How is it related to the development process and methods of contemporary youth’s social competence?

How should the government, society and enterprises form a joint force to resolve the potential risks of cognitive narrowing, polarization, weakening of thinking ability and social attributes brought by information narrowing among young people?

Today, with the rapid development of information technology, some young people are in a dual state of network "social cattle" and reality "social fear". Their offline social will is weakened, their social ability is degraded, and they will feel nervous, stressed and uncomfortable in face-to-face social interaction, resulting in social anxiety; In the online world, these "social fear" youths show the state of "social cattle" and can quickly become one with strangers.

Why is there such social duality? What kind of alienation has taken place between social rules and communication logic in the digital living space? How is it related to the development process and way of contemporary youth’s social ability? How should the government, society and enterprises form a joint force to resolve the potential risks such as cognitive narrowing, polarization, weakening of thinking ability and social attributes caused by information narrowing among young people?

Two-faced youth indulge in network socialization

A number of interviewed experts said that the two-sided social status of some young people is closely related to the characteristics of the online world, such as "people are bright and I am dark" and "partial eclipse" of information.

On the one hand, network socialization is different from the concealment of real socialization, which will bring people a certain sense of security and easily become a "social cow" in cyberspace.

According to Zong Chunshan, director of the Beijing Youth Legal and Psychological Counseling Service Center, the essence of "social fear" is to care too much about the evaluation of others. In the past, in family education and school education, external evaluation was often overemphasized, which led some children to rely too much on external factors and ignore the value and significance of self-evaluation when establishing self-awareness. At the same time, some compulsory and mandatory education methods will also discourage children from actively expressing themselves.

However, the Internet is anonymous, and online social interaction shows the characteristics of "people are clear and I am dark". It is easy for people to form a social security distance, which is helpful for young people to gain a sense of security and break through psychological barriers.

On the other hand, after obtaining a large amount of network information, it is easy to form an "information cocoon room". In particular, the "cocoon room" of useless information and bad information will make some young people indulge in it, and then they will be unfamiliar and uncomfortable with the real society that is inconsistent with the rules and standards of cyberspace, and become "social fear."

Information cocoon room is a concept put forward by American scholar Keith Sunstein in 2006, which means that if you only pay attention to your chosen field, a certain information source or something that makes you happy, you will "shape yourself" in the long run and close yourself in the narrow information space of your own choice like a silkworm.

In a survey report based on 1341 college students, about 53.1% of the respondents felt that their information horizons were limited, but only 19.5% of the respondents would actively search for information in other fields after being limited, and 46.9% of the respondents were willing to accept algorithm push and only focused on their own areas of interest.

This means that the "information cocoon room" is the result of personal subjective choice to some extent.

Huang Haiyan, an associate professor at the School of Humanities, Jiangxi University of Finance and Economics, said that some young people tend to receive online information according to their personal preferences, which leads to "partial eclipse" of information, and the "three views" displayed in the online world are likely to split or contradict the "three views" practiced in real work and life. Because it is difficult to switch cognitive modes quickly and smoothly, some young people who are used to digital survival often choose to reduce their social frequency in the real world, or even become "socially fearful" youth. More seriously, they may be divorced from the real world and indulge in the "comfort zone" of the network, deepening the "two-sidedness" of social interaction.

In addition, due to the educational environment, growth experience, social space and other reasons, if teenagers fail to establish good real social habits from an early age, they will easily show fear of difficulties in real communication and turn to cyberspace for approval.

Network socialization will form a kind of presence feeling similar to collective consciousness through common focus and shared emotional state. For those who are insecure and eager for intimacy, it is not only a catharsis and compensation to get rid of the real social dilemma, but also a strong emotional experience and emotional stimulation.

Chu Zhaohui, a researcher at the Chinese Academy of Educational Sciences, said that increasing the frequency of social interaction is the basis for learning to establish meaningful relationships with others. Real social behaviors, such as sports and collective labor, are more closely linked and powerful than online social activities.

"Weak communication" or cognitive, thinking and social "constraints"

It is worthy of attention that some teenagers are easily caught up in online social interaction. Experts believe that it reflects that the logic of information dissemination in the Internet age has quietly changed.

Huang Chuxin, deputy director and secretary general of the New Media Research Center of China Academy of Social Sciences, said that based on the media ecology in the digital age, the "one-to-many" communication logic of mass communication in the traditional era has changed, and now the communication ecology is more "one-to-one" and "many-to-one". In the era of "everyone is media", communication logic pays more attention to verticalization, personalization and immersion.

Some scholars believe that the change of information communication logic will produce a phenomenon of "weak communication", that is, the strength of the public opinion world is inverted from the strength of the real world-the strong in reality is precisely the weak in the public opinion field, and the weak in reality is more likely to become the strong in the public opinion field.

Experts believe that some young people are bound by cocoon houses in virtual and real social interaction, digital social interaction and anthropomorphic social interaction, and rely too much on the network environment, so it is difficult to distinguish the boundary between the real world and the network world. Over time, it may affect their three "constraints" of cognition, thinking and social interaction, and affect their establishment of a comprehensive and rational cognition of the objective world.

One-sided information infiltration, cognitive constraints. Chen Zhiqiang, dean of the School of Culture and Communication, Zhejiang Wanli University, believes that the accumulation of information recommended by intelligent algorithms may lead to the narrowing or even polarization of some young people’s horizons, and the repeated acceptance of the same or similar views will easily lead to cognitive prejudice.

In addition, the infiltration of fragmented information constantly stimulates the audio-visual senses, and it is easy for some young people to reduce their concentration; Even learning information will affect the establishment of the overall knowledge structure because of the scattered information.

Indulge in homogeneous information, thinking is constrained. In cyberspace, people gather together based on ideas, opinions and tastes, accompanied by a lot of homogeneous content. Immersed in it for a long time, it is easy to strengthen self-repetition and self-cognition, and weaken the tolerance of heterogeneous content.

Qiu Ling, deputy secretary of the Jiangxi Provincial Committee of the Communist Youth League, said that it is more common for young people with weak self-discipline to "brush their mobile phones". The algorithm pays more and more attention to cater to the needs of the audience, and pushes the content in some fields at high frequency, among which the entertainment content is mostly. Over time, its thinking ability may be limited.

Social attributes are weakened and social constraints are imposed. Yang Fengchi, a professor of psychology at Capital Medical University, said that the virtual world on the Internet can’t replace real life, and 80% of people’s communication depends on nonverbal information, while social networking will affect and constrain real communication, and the communication resources on the Internet can’t be realized in most cases, making it difficult to form real interpersonal competitiveness.

On the other hand, people often selectively present a "net self" different from their true selves, which will make the interaction between people lose its authenticity and reduce the human factors in communication.

Better "together" with others

Behind some teenagers’ excessive dependence on network socialization, it reflects the dislocation of education in social development.

Many interviewed experts said that compared with the long-term smoking in the online world, some teenagers’ education in the real world is not complete enough. Whether in school or at home, education is more about imparting knowledge. In many cases, academic performance has become the main yardstick to measure whether children are excellent or not. Teenagers don’t have enough time to play and play with their peers, and the cultivation of social skills is naturally acquired through close competition and cooperation with their peers.

Chu Zhaohui said that social interaction is human nature, and social skills do not need training. Children naturally have the instinct to communicate with others from birth. As long as they seize the sensitive period of social interaction, create opportunities for children to communicate with each other in a timely manner, increase the frequency of social interaction, and deepen the experience of social interaction in the process of growing up, children can gradually learn how to contact, get along with, blend and live with others.

If parents neglect that children should always be in contact with others and nature, it is easy to miss the sensitive period of developing social communication ability. Even if they study and train passively in the future, it is difficult to make up for the lack of social atmosphere in childhood.

Chu Zhaohui said that there are too few opportunities for "double-faced" youth to get in touch with society, understand social operation rules and social needs, and there is a gap between them and social needs, which may affect their integration into society in the long run.

The experts interviewed reminded that the real purpose of parenting is to prepare children for the life of adults. This preparation must be all-round.

Experts suggest that children should get out of the textbook knowledge of simple learning and "no growth", re-examine themselves, find their own advantages and potentials, define their own positioning, think more detached about careers, majors and ambitions, and examine society and the world with a broader vision.

In other words, the phenomenon of "double-faced" youth reminds us that what we really need to be wary of is not young people’s online social interaction, but how to make young people know themselves better, know the world better, "be together" with others, and better find a suitable position for themselves in the long-term development of society.

Published in Outlook, No.22, 2022

Starting from Sora, comprehensively interpret the development history of AI video model.

Text | Silicon Valley 101

Sora, OpenAI’s AI-generated video model, once released on February 15th, 2024, attracted global attention. The author of the AI video paper in Silicon Valley (not Sora) commented: quite good, there is no doubt about it. No.1

What’s good about Sora? What are the development challenges of generative AI video? Is the video model of OpenAI necessarily the right route? Has the so-called "world model" reached a consensus? This video,Through interviews with front-line AI practitioners in Silicon Valley, we have an in-depth talk about the development history of different factions of the generative AI video model, everyone’s disputes and future routes.

We actually wanted to do this topic last year, because when chatting with many people, including VC investors, we found that the difference between AI video model and ChatGPT, a big language model, was not very clear. But why didn’t you do it? Because at the end of last year, the best thing in the market was the function of Gen1 and Gen2, which are owned by runway, to generate video and text to generate video, but the effect we generated … is a bit complicated.

For example, in a video generated by runway, the prompt prompt is "super mario walking in a desert", and the resulting video is like this:

What do you think? Like Mario jumping on the moon. Whether it is gravity or friction, physics seems to suddenly disappear in this video.

Then we tried another cue, "A group of people walking down a street at night with umbrellas on the windows of stores." (A group of people are walking under the umbrella of the window eaves of a shop on a rainy night). This cue was also tried by an investor Garrio Harrison, and the resulting video was like this:

Look at this umbrella floating in the air, isn’t it weird … but this is the runway that represented the most advanced technology last year. After that, Pika Labs, founded by Chinese founder Demi Guo, became a hit for a while. It is considered to be slightly better than runway, but it is still limited by the length display of 3-4 seconds, and the generated video still has defects such as video understanding logic and hand composition.

So, before OpenAI released the Sora model,The generative AI video model has not attracted global attention like ChatGPT, Midjourney and other chat and text-based applications, largely because the technical difficulty of generating video is very high.Video is a two-dimensional space+time, from static to dynamic, from plane to plane in different time segments, which requires not only powerful algorithms and computing power, but also a series of complex problems such as consistency, coherence, physical rationality and logical rationality.

Therefore, the topic of generative video model has always been on the list of topics of Silicon Valley 101, but it has been delayed. We want to do this topic again when there is a major breakthrough in generative AI video model. Unexpectedly, this moment is coming so soon.

Sora’s display is undoubtedly the hanging of the previous runway and pika labs.

First of all, one of the biggest breakthroughs,Very intuitive is:The length of that generate video is greatly prolonged.Previously, runway and pika could only generate 3-4 seconds of video, which was too short, so the AI video works that could be circled before were only some fast-paced movie trailers, because other uses that needed longer materials could not be met at all.

On runway and pika, if you need a longer video, you need to constantly remind yourself of the duration of the superimposed video, but Jacob, our post-video editor, found that there would be a big problem.

Jacob, Silicon Valley 101 video post-editor:

The pain point is that when you continue to extend it later, the video behind it will be deformed, which will lead to inconsistency between the video images before and after, and then this material will not be used.

Sora’s latest paper and demo show that a video scene of about 1 minute can be generated directly according to the prompt words. At the same time, Sora will take into account the transformation of character scenes and the consistency of themes in the video. This made our editor excited after watching it.

Jacob, Silicon Valley 101 video post-editor:

(Sora) One of the videos shows a girl walking on the street in Tokyo … For me, this is very powerful. Therefore, even in the dynamic motion of the video, with the movement and rotation of the space, the people and objects appearing in Sora video will keep the consistent movement of the scene.

Third, Sora can accept videos, images or prompts as input.The model will generate video according to the user’s input,For example, a burst cloud in the demo was released. This means that Sora model can make animation based on static images, so as to expand the video forward or backward in time.

Fourthly, Sora can read and sample different widescreen or vertical videos, and can also output videos of different sizes according to the same video, and keep the style stable.For example, a sample of this little turtle. This is actually very helpful for our later video. Now, 1920*1080p horizontal video such as Youtube and bilibili, we need to re-cut it into vertical 1080*1920 video to adapt to short video platforms such as Tiktok and Tiktok, but it is conceivable that we may be able to convert it through Sora one-click AI, which is also a function I am looking forward to.

Fifth, long-distance coherence and time coherence are stronger.Previously, it was very difficult for AI to generate video, that is, the coherence of time, but Sora can remember the people and objects in the video well, even if it is temporarily blocked or moved out of the picture, it can keep the video coherent according to physical logic when it reappears later. For example, the video of this puppy released by Sora, when people walk past it, the picture is completely blocked, and when it appears again, it can naturally continue to move and keep the continuity of time and objects.

Sixth, Sora model can simply simulate the action of the world state.For example, the painter leaves new strokes on the canvas, which will persist over time, or a person will leave bite marks on the hamburger when eating it. There is an optimistic interpretation that this means that the model has a certain general knowledge, can "understand" the physical world in motion, and can also predict what will happen next in the picture.

Therefore, the shocking updates brought by the above Sora models have greatly improved the expectations and excitement of the outside world for the development of generative AI video. Although Sora will also have some logical errors, such as the cat having three claws, the street view having unconventional obstacles, and the person’s direction on the treadmill being reversed, etc., obviously, compared with the previous generated video, runway, pika or Google’s videopoet, Sora is an absolute leader, andMore importantly, OpenAI seems to want to prove through Sora that the "miracle-making" method of calculating the parameters of the power reactor can also be applied to the generated video, and through the integration of the diffusion model and the large language model, such a new model route will form the basis of the so-called "world model"And these views have also caused great controversy and discussion in the AI ​ ​ session.

Next, we will try to review the technical development of the generative AI model, and try to analyze how Sora’s model works. Is it the so-called "world model"?

In the early stage of AI video generation, it mainly relies on two models: GAN (Generative Countermeasure Network) and VAE (Variational Self-Encoder). However, the video content generated by these two methods is relatively limited, relatively single and static, and the resolution is often not good, so it is completely impossible to be commercialized. So we won’t talk about these two models first.

After that,Video generated by AI has evolved into two technical routes, one is the diffusion model specially used in the video field, and the other is the Transformer model.Let’s talk about the route of the diffusion model first, and the companies that ran out are Runway and Pika Labs.

Diffusion Model is diffusion model in English. Many people don’t know that the original model of the most important open source model, Stable Diffusion, was released by Runway and the team of Munich University, and Stable Diffusion itself is the underlying technical foundation behind Runway’s core products-video editors Gen-1 and Gen-2.

Gen-1 model was released in February 2023, which allows people to change the visual style of the original video by inputting text or images, for example, turning the real street scene shot by mobile phones into the cyber world. In June, runway released Gen-2, which further enabled users to directly generate text prompts as videos.

The principle of diffusion model,As soon as you hear the name "diffusion model", you can get a little:Image or video is generated by gradual diffusion.In order to better explain the principle of the model to everyone, we invited Dr. Zhang Songyang, one of the authors of the previous Meta Make-A-Video model and currently engaged in video generation model in Amazon AGI team, to give us an explanation.

Dr. Zhang Sean Song, one of the authors of the Meta Make-A-Video model, and an applied scientist of Amazon AGI team;

The reason why the name diffusion was used in this paper at the beginning stems from a physical phenomenon, that is to say, if we drop ink into a glass of water, it will disperse, and this thing is called diffusion. This process itself is physically irreversible, but we AI can learn such a process and reverse it. It is a picture by analogy with a picture. It keeps adding noise, and then it will become an effect similar to mosaic. It is a picture with pure noise. Then we learn how to turn this noise into an original picture.

If we train such a model and complete it directly in one step, it may be difficult. It is divided into many steps, for example, I divide it into 1000 steps. For example, if I add a little noise, it can restore what it looks like after removing the noise, and then the noise is added.More often, how can I use this model to predict noise? That is, it is divided into many steps, and then it gradually removes the noise, and it iteratively removes the noise slowly. For example, it turns out that water and ink have been completely mixed together. How can you predict it and how can it change back to the previous drop of ink step by step? Is that it is an inverse process of diffusion.

Dr. Zhang Sean Song explained it vividly.The core idea of diffusion model is to gradually generate realistic images or videos by constantly introducing randomness into the original noise. Now this process is divided into four steps:

1) initialization:The diffusion model begins with a random noisy image or video frame as the initial input.

2) diffusion process (also called forward process):The goal of the diffusion process is to make the picture unclear and finally become complete noise.

3) reverse process (also known as backward diffusion):At this time, we will introduce the "neural network", such as the UNet structure based on the convolutional neural network (CNN), and predict the "added noise to achieve the blurred image of the current frame" at each time step, so as to generate the next frame image by removing this noise, so as to form the realistic content of the image.

4) Repeat the steps: repeat the above steps until the required length of the generated image or video is reached.

The above is the generation mode of video to video or picture to video, and it is also the basic technical operation mode of runway Gen1. If you want to input prompt words to achieve text to video, then you need to add a few more steps.

For example, let’s take the Imagen model released by Google in mid-2022 as an example: our prompt isA boy is riding on the Rocket, a boy riding a rocket. This prompt will be converted into tokens and passed to the encoder text encoder. Google IMAGEN model then uses T5-XXL LLM encoder to encode the input text as embeddings. These embedments represent our text prompts, but they are coded in a way that the machine can understand.

These "embedded texts" will then be passed to an image generator, which will generate low-resolution images with 64×64 resolution. After that, the IMAGEN model uses the super-resolution diffusion model to upgrade the image from 64×64 to 256×256, and then adds a layer of super-resolution diffusion model, and finally generates a 1024×1024 high-quality image closely combined with our text prompts.

To sum up briefly, in this process, the diffusion model starts from the random noise image, and in the denoising process, the coded text is used to generate high-quality images.

And why is it so difficult to generate video than to generate pictures?

Dr. Zhang Sean Song, one of the authors of the Meta Make-A-Video model, and an applied scientist of Amazon AGI team;

Its principle is actually the same, but the only difference is that there is an additional timeline. It’s the picture we just talked about. It’s a picture.It’s 2D. It’s height and width. Then video it has an extra timeline, it is a 3D, it is the height, width and time. Then in the process of learning the inverse process of diffusion, it is equivalent to a 2D inverse process before, and now it becomes a 3D inverse process, which is such a difference.

So the problems in the picture, such as these generated faces, are they real? Then if we have such a problem in the picture, so will our video. For the video, it has some unique problems, such as whether the main body of this picture is consistent as you said just now. I think at present, for things like scenery, the effect is actually ok. Then, if people are involved, because these requirements of people may be more detailed, it will be more difficult for people. This is a problem. Then there is a current difficulty, which I think is also a direction that everyone is working hard on, that is, how to make the video longer. Because at present, only generating videos of 2 seconds, 3 seconds and 4 seconds is far from satisfying the current application scenarios.

Compared with previous models such as GAN, the diffusion model has three main advantages:

First, stability:The training process is usually more stable, and it is not easy to fall into problems such as pattern collapse or pattern collapse.

Second, the generated image quality: Diffusion model can generate high-quality images or videos, especially in the case of sufficient training, and the generated results are usually more realistic.

Third, no specific architecture is required: The diffusion model does not depend on the specific network structure, and has good compatibility. Many different types of neural networks can be used.

howeverThe diffusion model also has two main shortcomings,Including:

First, the training cost is high:Compared with some other generation models, the training of diffusion model may be more expensive, because it needs to learn to dry under different noise levels, and it takes longer to train.

Secondly, it takes more time to generate.Because it needs to be dried step by step to generate images or videos, instead of generating the whole sample at one time.

Dr. Zhang Sean Song, one of the authors of the Meta Make-A-Video model, and an applied scientist of Amazon AGI team;

One of the most important reasons why we can’t actually grow videos now is that our video memory is limited. Generating a picture may occupy a part of the main memory, and then if you generate 16 pictures, you may almost fill this main memory. When you need to generate more pictures, you have to find a way to get there, considering the information that has been generated before, and then predicting what kind of information to generate later. First of all, it puts forward a higher requirement on the model. Of course, the computational power is also a problem, that is, after many years of acquisition, our memory will be very large, and maybe we won’t have such a problem. It is also possible, but for now, we need a better algorithm, but if we have better hardware, this problem may not exist.

Therefore, it is doomed that the current video diffusion model itself may not be the best algorithm, although representative companies such as runway and PikaLabs have been optimizing the algorithm.

Next, let’s talk about another faction:Video technology route generated by large language model based on Transformer architecture.

Finally, at the end of December, 2023, Google released a generative AI video model, VideoPoet, based on the big language model, which was regarded as another solution and way out besides the diffusion model in the field of generating video at that time. What is the principle?How does the big language model generate video?

The generation of video by large language model is realized by understanding the temporal and spatial relationship of video content. Google’s VideoPoet is an example of using a large language model to generate video. At this time, let’s invite Dr. Zhang Sean Song, a generative AI scientist, to give us a vivid explanation.

Dr. Zhang Sean Song, one of the authors of the Meta Make-A-Video model, and an applied scientist of Amazon AGI team;

Then the big language model is completely different in principle. It is used on the text at first, that is, I predict what the next word is, for example, "I love telling the truth", then the last "I love telling the truth", and then what is the last word? Guess what word it is? Then maybe the more words you give in front of these words, the easier it is for you to guess the back. But if you give fewer words, you may have more room to play. It is such a process.

Then this idea is brought to the video, that is, we can learn the vocabulary of a picture, or the vocabulary of the video. That is to say, we can cut the picture horizontally, for example, 16 times horizontally and 16 times vertically, and then treat each small square and grid as a word, and then input it into this big language model for them to learn. For example, you have a good big language model before, and then you learn how to interact with these words in the big language model and the words in these texts or videos, and what kind of association is there between them? You learn some of this stuff, and then in this way, we can use these large language models to do some video tasks or some text tasks.

Simply put, the Videopoet based on the big language model works like this:

1) Input and understanding:First, Videopoet receives text, sound, picture, depth map, optical flow map, or video to be edited as input.

2) Video and sound coding:Because the text is a discrete form, the large language model naturally requires that the input and output must be discrete features. However, video and sound are continuous quantities. In order to make the big language model also use pictures, videos or sounds as input and output, here Videopoet encodes video and sound into discrete token. In deep learning, token is a very important concept, which refers to a group of symbols or identifiers used to represent a specific element in a group of data or information. In the example of Videopoet, it can be understood as the words of video and the words of sound.

3) Model training and content generation:With these Token words, we can train a Transformer to learn the tokens of predicting videos one by one according to the input given by users, just like learning text tokens, and the model will begin to generate content. For video generation, this means that the model needs to create a coherent frame sequence, which is not only visually logical, but also continuous in time.

4) Optimization and fine-tuning:The generated video may need further optimization and fine-tuning to ensure quality and consistency. This may include adjusting colors, lighting and transitions between frames. VideoPoet uses deep learning technology to optimize the generated videos, ensuring that they are not only in line with the text description, but also visually attractive.

5) Output: Finally, the generated video will be output for the end users to watch.

However, the route of generating video by large language model also has advantages and disadvantages.

Let’s talk about it firstAdvantages:

1) High understanding: The large language model based on Transformer architecture can process and understand a large amount of data, including complex text and image information. This makes the model have the ability of cross-modal understanding and generation, and can well learn the ability of correlation between different modes of text and picture video. This enables them to generate more accurate and relevant output when converting text descriptions into video content.

2) processing long sequence data: Because of the self-attention mechanism, the Transformer model is particularly good at processing long sequence data, which is especially important for video generation, because video is essentially a visual representation of long sequences.

3) Scalability of 3)Transformer:Generally speaking, the larger the model, the stronger the fitting ability. However, when the model is large enough, the gain of convolutional neural network performance will slow down or even stop due to the increase of the model, while Transformer can continue to grow. Transformer has proved this point in the big language model, and now it is gradually emerging in the field of image and video generation.

Let’s talk about it againDisadvantages:

1) Resource-intensive:Using large language model to generate video, especially high-quality video, requires a lot of computing resources, because the route of using large language model is to encode video into token, which is often much larger than the vocabulary of a sentence or even a paragraph. At the same time, if you predict one by one, it will cost a lot of time. In other words, this may make the training and reasoning process of Transformer model expensive and time-consuming.

Dr. Zhang Sean Song, one of the authors of the Meta Make-A-Video model, and an applied scientist of Amazon AGI team;

There is a problem that I think is quite essential, that is, transformer is not fast enough. This is a very essential problem, because transformer predicts the diffusion model block by block, and the diffusion model comes out directly, so transformer will definitely be slow.

Chen Qian, video manager of Silicon Valley 101:

It’s too slow. Is there a concrete data? Just how much slower?

Dr. Zhang Sean Song, one of the authors of the Meta Make-A-Video model, and an applied scientist of Amazon AGI team;

For example, if I draw a picture directly, the difference, for example, if a picture is 1, it also needs some iterative processes. Then, for example, I use four steps, which is four steps to generate, and we are four. If we do well at present, I think the effect of four steps is still good. Then but if you use transformer, for example, if you draw a square of 16*16, that’s 16*16, that’s 256, that’s the speed.

4 is equivalent to I did denoising iteration four times. Then transformer, it is equivalent to me to predict a picture, for example, 16*16, I predict 256 words. Their dimensions are definitely different, but you can see their complexity. It is the diffusion model, and its complexity is a constant set. But the complexity of transformer, which is actually a width x height, will be different. Therefore, from the perspective of complexity, the diffusion model is definitely better. Then specifically, I think this thing may be that the bigger the picture is, the higher the resolution is, and the bigger the problem of transformer may be.

Other problems with the Transformer model include:

2) Quality fluctuation:Although the Transformer model can generate creative video content, the output quality may be unstable, especially for complex or insufficiently trained models.

3) Data dependency:The performance of Transformer model depends largely on the quality and diversity of training data. If the training data is limited or biased, the generated video may not accurately reflect the input intention or be limited in diversity.

4) Understanding and logical restrictions:Although the Transformer model has made progress in understanding the text and image content, it may still be difficult for them to fully grasp the complex human emotions, humor or subtle social and cultural signals, which may affect the relevance and attractiveness of the generated video.

5) Ethics and prejudice: Automatic video generation technology may inadvertently copy or amplify the bias in training data, leading to ethical problems.

But when it comes to the fifth point, I suddenly remembered a recent news that no matter who you type in Google’s multimodal model Gemini, people of color come out, including the founding father of the United States, the black female version of the Pope, the Vikings are also colored, and the generated Elon Musk is also black.

The reason behind this may be that in order to correct the prejudice in the Transformer architecture, Google added the adjustment instructions of AI ethics and security, and the result was overdone, resulting in this big oolong. However, this happened after OpenAI released Sora, which really made Google ridiculed by the group.

However, the insiders also pointed out that the above five problems are not unique to the transformer architecture. At present, any generation model may have these problems, but the advantages and disadvantages of different models are slightly different in different directions.

So, to sum up here, there are some unsatisfactory places in the video generated by diffusion model and Transformer model. So, as the most cutting-edge company in technology, how do they do it? Well, maybe you guessed that these two models have their own advantages. If I combine them, will it be 1+1>2? So,Sora is the combination of diffusion model and Transformer model.

To tell the truth, the details of Sora are still unknown to the outside world, and it is not open to the public now. Even waitinglit is not open. Only a few people from the industry and design circles are invited to use it, and the videos produced are also made public online. For technology, it is more based on the guessing and analysis of the effect video given by OpenAI. OpenAI gave a vague technical explanation on the day Sora was released, but many technical details were missing.

butLet’s start with this technical analysis published by Sora.Let’s take a look at how OpenAI’s diffusion+big language model technology route works.

Sora made it very clear at the beginning:OpenAI "jointly trains text-conditional diffusion models" on videos and images with variable duration, resolution and aspect ratio. At the same time, the Transformer architecture is used to operate the spacetime patches of video and image potential codes.

So,The steps of Sora model generation include:

Step 1: Video compression network

In the video generation technology based on the big language model, we mentioned coding the video into discrete token, and Sora adopted the same idea here.Video is a three-dimensional input (two-dimensional space+one-dimensional time), where the video is divided into small token in three-dimensional space, which is called "spacetime patches" by OpenAI.

Step 2: Text Understanding

Because Sora has the blessing of the OpenAI model DALLE3, many videos without text annotation can be automatically labeled and used for the training of video generation. At the same time, thanks to the GPT, the user’s input can be expanded into a more detailed description, so that the generated video can be more suitable for the user’s input, and the transformer framework can help Sora model learn and extract features more effectively, obtain and understand a lot of detailed information, and enhance the generalization ability of the model to unseen data.

For example, if you type "a cartoon kangaroo is dancing disco", GPT will help Lenovo to say that it is necessary to wear a pair of sunglasses, a flowered shirt and a bunch of animals jumping together in the disco to give full play to Lenovo’s ability to explain the input prompt. Therefore, how well Sora can be generated will be determined by the rich explanations and details that GPT can develop. The GPT model is OpenAI’s own. Unlike other AI video startup companies, which need to call the GPT model, the efficiency and depth of the GPT architecture that OpenAI gives Sora are definitely the highest, which may be why Sora will do better in semantic understanding.

Step 3: Diffusion Transformer imaging.

Sora adopts the combination of Diffusion and Transformer.

Previously, we mentioned that Transformer has good scalability in video generation technology based on large language model. This means that the structure of Transformer will get better and better with the increase of the model. This feature is not available in all models. For example, when the model is large enough, the gain of convolutional neural network performance will slow down or even stop due to the increase of the model, while Transformer can continue to grow.

Many people will notice that Sora shows a stable ability in maintaining the stability, consistency, picture rotation, etc. of the picture objects, far exceeding the video models presented by runway, Pika, Stable Video and so on based on the Diffusion model.

I still remember that when we talked about the diffusion model, we also said: The challenge of video generation lies in the stability and consistency of the generated objects. This is because, although Diffusion is the mainstream of video generation technology, the previous work has been limited to the structure based on convolutional neural network, and it has not played its full potential. Sora skillfully combines the advantages of both Diffusion and Transformer, which makes the video generation technology get a greater improvement.

Furthermore, the video continuity generated by Sora may be obtained through the Transformer Self- Attention mechanism. Sora can discretize the time, and then understand the relationship between the time lines through the self-attention mechanism. The principle of self-attention mechanism is that each time point is related to all other time points, which is not available in the Diffusion Model.

At present, there are some speculations. In the third step of the diffusion model we mentioned earlier, Sora chose to replace the U-Net architecture with the Transformer architecture. This allows the Diffusion model, as a painter, to find a more appropriate part in OpenAI’s massive database according to the possibility probability corresponding to the keyword eigenvalues in the process of eliminating noise when starting to reverse diffusion and painting.

When I interviewed another AI practitioner, he used another vivid example to explain the difference here. He said: "The diffusion model predicts noise. Subtract the predicted noise from the picture at a certain time point, and you get the most original picture without noise, which is the final generated picture. It’s more like sculpture here. As Michelangelo said, he only removed the parts that should not exist in the stone according to God’s will, and finally he created great sculptures from it. Transformer, through the self-attention mechanism, understands the connection between time lines and makes this sculpture come down from the stone pedestal."Isn’t it pretty image?

Finally, Sora’s Transformer+Diffusion Model will patch the time and space to generate pictures, and then the pictures will be spliced into video sequences, and a Sora video will be generated.

Honestly,The methodology of Transformer plus diffusion model is not original by OpenAI.Before OpenAI released Sora, when we interviewed Dr. Zhang Sean Song in January this year, he mentioned that the way of adding diffusion model to Transformer has been widely studied in the industry.

Dr. Zhang Sean Song, one of the authors of the Meta Make-A-Video model, and an applied scientist of Amazon AGI team;

At present, we can see that some models of transformer are combined with diffusion, and then the effect may not be bad, or even some of them in the paper may be better. So I’m not sure how the model will develop in the future. I think it may be a way to combine the two. It’s like transformer, for example, it predictsThe next video has the natural advantage that it can predict what will become. Although the quality of diffusion is high, many practices of diffusion still generate a fixed number of frames. How to combine the two things is a process that will be studied later.

Therefore, this also explains why OpenAI wants to release Sora now. In fact, in the forum of OpenAI, Sora clarified that,Sora is not a mature product now, so it is not a released product, nor is it public, and there is no waiting list and no expected release date.

Some analysts outside believe that Sora is still immature, and the computing power of OpenAI may not be able to withstand the disclosure of Sora. At the same time, there are also false news security and moral problems after the disclosure, so Sora may not be officially released soon, but because transformer plus diffusion has become the direction of general attempts in the industry, at this time, OpenAI needs to demonstrate Sora’s ability to regain its leading position in the field of generative AI video with increasingly fierce competition.

With the verification of OpenAI, we can basically be sure that,The direction of AI video generation will shift to this new technology combination.OpenAI also clearly pointed out in the published technical article.The huge amount of parameters on ChatGPT, the way of "making great efforts to make miracles", has been proved in AI video generation.

OpenAI said in the article, "We found that the video model showed many interesting emerging functions during large-scale training. These functions enable Sora to simulate some aspects of people, animals and the environment in the real world.

This shows that Sora and GPT3 have the same emergence, which means that, like GPT language model, AI video needs more parameters, more GPU computing power and more capital investment.

Scaling is still the trick of generative AI at present, and this may also mean that generative AI video may eventually become a game of big companies.

Dr. Zhang Sean Song, one of the authors of the Meta Make-A-Video model, and an applied scientist of Amazon AGI team;

I think it may be more intuitive to be equivalent to you. For example, if you save a video, it may be tens of GB, and then it may be 1000 times larger in the big language model, and it will be TB, which is something like this, but I think you should be able to see such a trend, that is, although the parameter amount of the video is only at billion level now.

But like their previous stable diffusion model in the picture, they later produced a stable diffusion XL, and they also enlarged the model, and then brought some better effects, not to say better effects, that is, they could make a more realistic picture, and then the effect would be more obvious. I think this is a trend, that is, the number of parameters will definitely increase in the future, but how much gain it will bring depends on the structure of your current model, your data volume and what your data is like.

The above is our very preliminary analysis of Sora, and once again, because many technical details of Sora have not been made public, many of our analysis is also a guess from an external perspective. If there is any inaccuracy, please correct it, correct it and discuss it.

Reporting/feedback